RubyGems - rley - Versions diffs - 0.3.12 → 0.4.00 - Mend

rley 0.3.12 → 0.4.00

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +7 -0
data/README.md +69 -5
data/examples/NLP/mini_en_demo.rb +5 -1
data/examples/data_formats/JSON/JSON_demo.rb +1 -0
data/examples/general/calc/calc_demo.rb +2 -1
data/lib/rley/constants.rb +1 -1
data/lib/rley/parser/dotted_item.rb +1 -1
data/lib/rley/parser/error_reason.rb +106 -0
data/lib/rley/parser/gfg_chart.rb +1 -24
data/lib/rley/parser/gfg_earley_parser.rb +28 -57
data/lib/rley/parser/gfg_parsing.rb +54 -30
data/lib/rley/ptree/token_range.rb +0 -5
data/lib/rley/rley_error.rb +10 -0
data/lib/rley/sppf/parse_forest.rb +7 -9
data/spec/rley/parser/error_reason_spec.rb +120 -0
data/spec/rley/parser/gfg_chart_spec.rb +3 -54
data/spec/rley/parser/gfg_earley_parser_spec.rb +74 -63
data/spec/rley/parser/gfg_parsing_spec.rb +2 -3
data/spec/rley/support/grammar_pb_helper.rb +48 -0
metadata +7 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 3cb8a4bdff7b8a62407ea9b05f6efc807a62bcbf
-  data.tar.gz: 498be3de6c49c8b6800d5f64eb2e1237f845b405
+  metadata.gz: eb2c26370206f6c6eca059858ee0c8adedd32810
+  data.tar.gz: 77a42b3da998a2e8b073ec3a811287b71e6b3a3f
 SHA512:
-  metadata.gz: e8ca57661728076d51e311c936698bfe603962de9041c02a686bb7cce360ad21f992437e7f3f460b7792b78c1fe30106cd955b3dbe5ecebdd8973112dc2375df
-  data.tar.gz: 984d7e2dc9f42831657f3a30d58c5e4145cc9383df894f3234dabd09c34138bdfa3bde3aa956f1b14dc8cce97dc90eaa11c85074fca21befd9a038cac95e3218
+  metadata.gz: b16495b26269ee208ed3151f820a296d801ed7ca01ea9c98cf29b554da4ceba55719d67a7a7e15dc4fee9b70b54b1f08881ae0dc499b217f47db493b873af4eb
+  data.tar.gz: e463f9697c3cf8b012c8bc8c7736e675d6d355d3f81197bac7fb23529bb0c9e66c791d45ad833f2d6fadeb7eb2adb1a5eed6b3415292bb31fe8a02a43d2fed94

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,10 @@
+### 0.4.00 / 2016-12-17
+* [CHANGE] Error reporting is vastly changed. Syntax errors don't raise exceptions.
+  parse error can be retrieved via an `ErrorReason` object. Such an object is returned by the
+  method `GFGParsing#failure_reason` method.
+* [CHANGE] File `README.md` updated to reflect the new error reporting.
+* [CHANGE] Examples updated to reflect the new error reporting.
 ### 0.3.12 / 2016-12-08
 * [NEW] Directory `examples\general\calc`. A simple arithmetic expression demo parser.

data/README.md CHANGED Viewed

@@ -64,7 +64,7 @@ Installing the latest stable version is simple:
 ## A whirlwind tour of Rley
 The purpose of this section is show how to create a parser for a minimalistic
-English language subset.
+English language subset.
 The tour is organized into the following steps:
 1. [Defining the language grammar](#defining-the-language-grammar)
 2. [Creating a lexicon](#creating-a-lexicon)
@@ -73,7 +73,7 @@ The tour is organized into the following steps:
 5. [Parsing some input](#parsing-some-input)
 6. [Generating the parse forest](#generating-the-parse-forest)
-The complete source code of the tour can be found in the
+The complete source code of the example used in this tour can be found in the
 [examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
 directory
@@ -86,7 +86,7 @@ The subset of English grammar is based on an example from the NLTK book.
     # Instantiate a builder object that will build the grammar for us
     builder = Rley::Syntax::GrammarBuilder.new do
       # Terminal symbols (= word categories in lexicon)
-      add_terminals('Noun', 'Proper-Noun', 'Verb')
+      add_terminals('Noun', 'Proper-Noun', 'Verb')
       add_terminals('Determiner', 'Preposition')
       # Here we define the productions (= grammar rules)
@@ -97,7 +97,7 @@ The subset of English grammar is based on an example from the NLTK book.
       rule 'VP' => %w[Verb NP]
       rule 'VP' => %w[Verb NP PP]
       rule 'PP' => %w[Preposition NP]
-    end
+    end
     # And now, let's build the grammar...
     grammar = builder.grammar
 ```
@@ -178,11 +178,75 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
     pforest = result.parse_forest
 ```
+## Error reporting
+__Rley__ is a non-violent parser, that is, it won't throw an exception when it
+detects a syntax error. Instead, the parse result will be marked as
+non-successful. The parse error can then be identified by calling the
+`GFGParsing#failure_reason` method. This method returns an error reason object
+which can help to produce an error message.
+Consider the example from the [Parsing some input](#parsing-some-input) section
+above and, as an error, we delete the verb `saw` in the sentence to parse.
+```ruby
+    # Verb has been removed from the sentence on next line
+    input_to_parse = 'John Mary with a telescope'
+    # Convert input text into a sequence of token objects...
+    tokens = tokenizer(input_to_parse, grammar)
+    result = parser.parse(tokens)
+    puts "Parsing successful? #{result.success?}" # => Parsing successful? false
+    exit(1)
+```
+As expected, the parse is now failing.
+To get an error message, one just need to retrieve the error reason and
+ask it to generate a message.
+```ruby
+    # Show error message if parse fails...
+    puts result.failure_reason.message unless result.success?
+```
+Re-running the example with the error, result in the error message:
+```
+  Syntax error at or near token 2 >>>Mary<<<
+  Expected one 'Verb', found a 'Proper-Noun' instead.
+```
+The standard __Rley__ message not only inform about the location of
+the mistake, it also provides some hint by disclosing its expectations.
+Let's experiment again with the original sentence but without the word
+`telescope`.
+```ruby
+    # Last word has been removed from the sentence on next line
+    input_to_parse = 'John saw Mary with a '
+    # Convert input text into a sequence of token objects...
+    tokens = tokenizer(input_to_parse, grammar)
+    result = parser.parse(tokens)
+    puts "Parsing successful? #{result.success?}" # => Parsing successful? false
+    unless result.success?
+      puts result.failure_reason.message
+      exit(1)
+    end
+```
+This time, the following output is displayed:
+```
+  Parsing successful? false
+  Premature end of input after 'a' at position 5
+  Expected one 'Noun'.
+```
+Again, the resulting error message is user-friendly.
+Remark: currently, Rley reports an error position as the index of the
+input token with which the error was detected.
 ## Examples
-The project source directory contains several example scripts that demonstrate
+The project source directory contains several example scripts that demonstrate
 how grammars are to be constructed and used.

data/examples/NLP/mini_en_demo.rb CHANGED Viewed

@@ -83,7 +83,11 @@ input_to_parse = 'John saw Mary with a telescope'
 tokens = tokenizer(input_to_parse, grammar)
 result = parser.parse(tokens)
-puts "Parsing successful? #{result.success?}" # => Parsing successful? true
+puts "Parsing successful? #{result.success?}"
+unless result.success?
+  puts result.failure_reason.message
+  exit(1)
+end
 ########################################
 # Step 6. Generating the parse forest

data/examples/data_formats/JSON/JSON_demo.rb CHANGED Viewed

@@ -23,6 +23,7 @@ result = parser.parse_file(file_name) # result object contains parse details
 unless result.success?
   # Stop if the parse failed...
   puts "Parsing of '#{file_name}' failed"
+  puts result.failure_reason.message
   exit(1)
 end

data/examples/general/calc/calc_demo.rb CHANGED Viewed

@@ -22,7 +22,8 @@ result = parser.parse_expression(ARGV[0])
 unless result.success?
   # Stop if the parse failed...
-  puts "Parsing of '#{file_name}' failed"
+  puts "Parsing of '#{ARGV[0]}' failed"
+  puts "Reason: #{result.failure_reason.message}"
   exit(1)
 end

data/lib/rley/constants.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module Rley # Module used as a namespace
   # The version number of the gem.
-  Version = '0.3.12'.freeze
+  Version = '0.4.00'.freeze
   # Brief description of the gem.
   Description = "Ruby implementation of the Earley's parsing algorithm".freeze

data/lib/rley/parser/dotted_item.rb CHANGED Viewed

@@ -115,7 +115,7 @@ module Rley # This module is used as a namespace
       private
-      # Return the given after its validation.
+      # Return the given position after its validation.
       def valid_position(aPosition)
         rhs_size = production.rhs.size
         if aPosition < 0 || aPosition > rhs_size

data/lib/rley/parser/error_reason.rb ADDED Viewed

@@ -0,0 +1,106 @@
+module Rley # Module used as a namespace
+  module Parser # This module is used as a namespace
+    # Abstract class. An instance represents an explanation describing
+    # the likely cause of a parse error
+    # detected by Rley.
+    class ErrorReason
+      # The position of the offending input token
+      attr_reader(:position)
+      # The failing production
+      attr_reader(:production)
+      def initialize(aPosition)
+        @position = aPosition
+      end
+      # Returns the result of invoking reason.to_s.
+      def message()
+        return self.to_s
+      end
+      # Return this reason's class name and message
+      def inspect
+        "#{self.class.name}: #{message}"
+      end
+    end # class
+    # This parse error occurs when no input for parsing was provided
+    # while the grammar requires some non-empty input.
+    class NoInput < ErrorReason
+      def initialize()
+        super(0)
+      end
+      # Returns the reason's message.
+      def to_s
+        'Input cannot be empty.'
+      end
+    end # class
+    # Abstract class and subclass of ErrorReason.
+    # This specialization represents errors in which the input
+    # didn't match one of the expected token.
+    class ExpectationNotMet < ErrorReason
+      # The last input token read when error was detected
+      attr_reader(:last_token)
+      # The terminal symbols expected when error was occurred
+      attr_reader(:expected_terminals)
+      def initialize(aPosition, lastToken, expectedTerminals)
+        super(aPosition)
+        @last_token = lastToken.dup
+        @expected_terminals = expectedTerminals.dup
+      end
+      protected
+      # Emit a text explaining the expected terminal symbols
+      def expectations
+        term_names = expected_terminals.map(&:name)
+        explain = 'Expected one '
+        explain << if expected_terminals.size > 1
+                     "of: ['#{term_names.join("', '")}']"
+                   else
+                     "'#{term_names[0]}'"
+                   end
+        return explain
+      end
+    end # class
+    # This parse error occurs when the current token from the input
+    # is unexpected according to the grammar rules.
+    class UnexpectedToken < ExpectationNotMet
+      # Returns the reason's message.
+      def to_s
+        err_msg = "Syntax error at or near token #{position + 1} "
+        err_msg << ">>>#{last_token.lexeme}<<<\n"
+        err_msg << expectations
+        err_msg << ", found a '#{last_token.terminal.name}' instead."
+        return err_msg
+      end
+    end # class
+    # This parse error occurs when all input tokens were consumed
+    # but the parser still expected one or more tokens from the input.
+    class PrematureInputEnd < ExpectationNotMet
+      # Returns the reason's message.
+      def to_s
+        err_msg = "Premature end of input after '#{last_token.lexeme}'"
+        err_msg << " at position #{position + 1}\n"
+        err_msg << "#{expectations}."
+        return err_msg
+      end
+    end # class
+  end # module
+end # module
+# End of file

data/lib/rley/parser/gfg_chart.rb CHANGED Viewed

@@ -12,17 +12,8 @@ module Rley # This module is used as a namespace
       # An array of entry sets (one per input token + 1)
       attr_reader(:sets)
-      # The level of trace details reported on stdout during the parse.
-      # The possible values are:
-      # 0: No trace output (default case)
-      # 1: Show trace of scanning and completion rules
-      # 2: Same as of 1 with the addition of the prediction rules
-      attr_reader(:tracer)
       # @param tokenCount [Fixnum] The number of lexemes in the input to parse.
-      # @param aTracer [ParseTracer] A tracer object.
-      def initialize(tokenCount, aGFGraph, aTracer)
-        @tracer = aTracer
+      def initialize(tokenCount, aGFGraph)
         @sets = Array.new(tokenCount + 1) { |_| ParseEntrySet.new }
         push_entry(aGFGraph.start_vertex, 0, 0, :start_rule)
       end
@@ -53,20 +44,6 @@ module Rley # This module is used as a namespace
       def push_entry(aVertex, anOrigin, anIndex, aReason)
         new_entry = ParseEntry.new(aVertex, anOrigin)
         pushed = self[anIndex].push_entry(new_entry)
-        if pushed == new_entry && tracer.level > 0
-          case aReason
-            when :start_rule, :prediction
-              tracer.trace_prediction(anIndex, new_entry)
-            when :scanning
-               tracer.trace_scanning(anIndex, new_entry)
-            when :completion
-               tracer.trace_completion(anIndex, new_entry)
-            else
-              raise NotImplementedError, "Unknown push_entry mode #{aReason}"
-          end
-        end
         return pushed
       end

data/lib/rley/parser/gfg_earley_parser.rb CHANGED Viewed

@@ -17,33 +17,34 @@ module Rley # This module is used as a namespace
       # Parse a sequence of input tokens.
       # @param aTokenSequence [Array] Array of Tokens objects returned by a
       # tokenizer/scanner/lexer.
-      # @param aTraceLevel [Fixnum] The specified trace level.
-      # The possible values are:
-      # 0: No trace output (default case)
-      # 1: Show trace of scanning and completion rules
-      # 2: Same as of 1 with the addition of the prediction rules
       # @return [Parsing] an object that embeds the parse results.
-      def parse(aTokenSequence, aTraceLevel = 0)
-        tracer = ParseTracer.new(aTraceLevel, $stdout, aTokenSequence)
-        result = GFGParsing.new(gf_graph, aTokenSequence, tracer)
+      def parse(aTokenSequence)
+        result = GFGParsing.new(gf_graph, aTokenSequence)
         last_token_index = aTokenSequence.size
+        if last_token_index == 0 && !grammar.start_symbol.nullable?
+          return unexpected_empty_input(result)
+        end
         (0..last_token_index).each do |i|
-          handle_error(result) if result.chart[i].empty?
           result.chart[i].each do |entry|
             # Is entry of the form? [A => alpha . B beta, k]...
             next_symbol = entry.next_symbol
             if next_symbol && next_symbol.kind_of?(Syntax::NonTerminal)
               # ...apply the Call rule
-              call_rule(result, entry, i, tracer)
+              call_rule(result, entry, i)
             end
-            exit_rule(result, entry, i, tracer) if entry.exit_entry?
-            start_rule(result, entry, i, tracer) if entry.start_entry?
-            end_rule(result, entry, i, tracer) if entry.end_entry?
+            exit_rule(result, entry, i) if entry.exit_entry?
+            start_rule(result, entry, i) if entry.start_entry?
+            end_rule(result, entry, i) if entry.end_entry?
+          end
+          if i < last_token_index
+            scan_success = scan_rule(result, i)
+            break unless scan_success
           end
-          scan_rule(result, i, tracer) if i < last_token_index
         end
+        result.done # End of parsing process
         return result
       end
@@ -55,10 +56,7 @@ module Rley # This module is used as a namespace
       # Then the entry [.B, i] is added to the current sigma set.
       # Gist: when an entry expects the non-terminal symbol B, then
       # add an entry with start vertex .B
-      def call_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] Call rule applied upon #{anEntry}:"
-        end
+      def call_rule(aParsing, anEntry, aPosition)
         aParsing.call_rule(anEntry, aPosition)
       end
@@ -69,10 +67,7 @@ module Rley # This module is used as a namespace
       # is added to the current sigma set.
       # Gist: for an entry corresponding to a start vertex, add an entry
       # for each entry edge in the graph.
-      def start_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] Start rule applied upon #{anEntry}:"
-        end
+      def start_rule(aParsing, anEntry, aPosition)
         aParsing.start_rule(anEntry, aPosition)
       end
@@ -81,10 +76,7 @@ module Rley # This module is used as a namespace
       # production. Then entry [B., k] is added to the current entry set.
       # Gist: for an entry corresponding to a reduced production, add an entry
       # for each exit edge in the graph.
-      def exit_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] Exit rule applied upon #{anEntry}:"
-        end
+      def exit_rule(aParsing, anEntry, aPosition)
         aParsing.exit_rule(anEntry, aPosition)
       end
@@ -92,10 +84,7 @@ module Rley # This module is used as a namespace
       # is added to a parse entry set with index j.
       # then for every entry of the form [A => α . B γ, i] in the kth sigma set
       #   the entry [A => α B . γ, i] is added to the jth sigma set.
-      def end_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] End rule applied upon #{anEntry}:"
-        end
+      def end_rule(aParsing, anEntry, aPosition)
         aParsing.end_rule(anEntry, aPosition)
       end
@@ -105,35 +94,17 @@ module Rley # This module is used as a namespace
       #     and allow them to cross the edge, adding the node on the back side
       #     of the edge as an entry to the next sigma set:
       #       add an entry to the next sigma set [A => α t . γ, i]
-      def scan_rule(aParsing, aPosition, aTracer)
-        if aTracer.level > 1
-          prefix = "Chart[#{aPosition}] Scan rule applied upon "
-          puts prefix + aParsing.tokens[aPosition].to_s
-        end
+      def scan_rule(aParsing, aPosition)
         aParsing.scan_rule(aPosition)
       end
+      # Parse error detected: no input tokens provided while the grammar
+      # forbids this this.
+      def unexpected_empty_input(aParsing)
+        aParsing.faulty(NoInput.new)
+        return aParsing
+      end
-      # Raise an exception to indicate a syntax error.
-      def handle_error(aParsing)
-        # Retrieve the first empty state set
-        pos = aParsing.chart.sets.find_index(&:empty?)
-        lexeme_at_pos = aParsing.tokens[pos - 1].lexeme
-        puts "chart index: #{pos - 1}"
-        terminals = aParsing.chart.sets[pos - 1].expected_terminals
-        puts "count expected terminals #{terminals.size}"
-        entries = aParsing.chart.sets[pos - 1].entries.map(&:to_s).join("\n")
-        puts "Items #{entries}"
-        term_names = terminals.map(&:name)
-        err_msg = "Syntax error at or near token #{pos}"
-        err_msg << ">>>#{lexeme_at_pos}<<<:\nExpected "
-        err_msg << if terminals.size > 1
-                     "one of: ['#{term_names.join("', '")}'],"
-                   else
-                     ": #{term_names[0]},"
-                   end
-        err_msg << " found a '#{aParsing.tokens[pos - 1].terminal.name}'"
-        raise StandardError, err_msg + ' instead.'
-      end
     end # class
   end # module
 end # module

data/lib/rley/parser/gfg_parsing.rb CHANGED Viewed

@@ -1,4 +1,5 @@
 require_relative 'gfg_chart'
+require_relative 'error_reason'
 require_relative 'parse_entry_tracker'
 require_relative 'parse_forest_factory'
@@ -15,22 +16,21 @@ module Rley # This module is used as a namespace
       # The sequence of input token to parse
       attr_reader(:tokens)
-      # A Hash with pairs of the form:
+      # A Hash with pairs of the form:
       # parse entry => [ antecedent parse entries ]
       # It associates to a every parse entry its antecedent(s), that is,
-      # the parse entry/ies that causes the key parse entry to be created
+      # the parse entry/ies that causes the key parse entry to be created
       # with one the gfg rules
       attr_reader(:antecedence)
-      # @param aTracer [ParseTracer] An object that traces the parsing.
-      # The possible values are:
-      # 0: No trace output (default case)
-      # 1: Show trace of scanning and completion rules
-      # 2: Same as of 1 with the addition of the prediction rules
-      def initialize(theGFG, theTokens, aTracer)
+      # The reason of a parse failure
+      attr_reader(:failure_reason)
+      def initialize(theGFG, theTokens)
         @gf_graph = theGFG
         @tokens = theTokens.dup
-        @chart = GFGChart.new(tokens.size, gf_graph, aTracer)
+        @chart = GFGChart.new(tokens.size, gf_graph)
         @antecedence = Hash.new { |hash, key| hash[key] = [] }
         antecedence[chart[0].first]
       end
@@ -45,7 +45,7 @@ module Rley # This module is used as a namespace
         next_symbol = anEntry.next_symbol
         start_vertex = gf_graph.start_vertex_for[next_symbol]
         pos = aPosition
-        apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
+        apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
       end
       # Let the current sigma set be the ith parse entry set.
@@ -65,7 +65,7 @@ module Rley # This module is used as a namespace
       end
       # This method must be invoked when an entry is added to a parse entry set
-      # and is of the form [B => γ ., k] (the dot is at the end of the
+      # and is of the form [B => γ ., k] (the dot is at the end of the
       # production. Then entry [B., k] is added to the current entry set.
       # Gist: for an entry corresponding to a reduced production, add an entry
       # for each exit edge in the graph.
@@ -96,11 +96,12 @@ module Rley # This module is used as a namespace
       end
       # Given that the terminal t is at the specified position,
-      #   Locate all entries in the current sigma set that expect t:
+      #   Locate all entries in the current sigma set that expect t:
       #     [A => α . t γ, i]
       #     and allow them to cross the edge, adding the node on the back side
       #     of the edge as an entry to the next sigma set:
       #       add an entry to the next sigma set [A => α t . γ, i]
+      # returns true if next token matches the expectations, false otherwise.
       def scan_rule(aPosition)
         terminal = tokens[aPosition].terminal
@@ -108,7 +109,10 @@ module Rley # This module is used as a namespace
         expecting_term = chart[aPosition].entries4term(terminal)
         # ... if the terminal isn't expected then we have an error
-        handle_error(aPosition) if expecting_term.empty?
+        if expecting_term.empty?
+          unexpected_token(aPosition)
+          return false
+        end
         expecting_term.each do |ntry|
           # Get the vertices after the expected terminal
@@ -119,6 +123,8 @@ module Rley # This module is used as a namespace
             apply_rule(ntry, vertex_after_terminal, origin, pos, :scan_rule)
           end
         end
+        return true
       end
@@ -136,7 +142,7 @@ module Rley # This module is used as a namespace
       end
       # Factory method. Builds a ParseForest from the parse result.
-      # @return [ParseForest]
+      # @return [ParseForest]
       def parse_forest()
         factory = ParseForestFactory.new(self)
@@ -148,7 +154,7 @@ module Rley # This module is used as a namespace
       # with origin equal to zero.
       def initial_entry()
         return chart.initial_entry
-      end
+      end
       # Retrieve the accepting parse entry that represents
       # a complete, successful parse
@@ -158,25 +164,43 @@ module Rley # This module is used as a namespace
         return chart.accepting_entry
       end
+      # Mark the parse as erroneous
+      def faulty(aReason)
+        @failure_reason = aReason
+      end
+      # A notification that the parsing reached an end
+      def done
+        unless self.success? || self.failure_reason
+          # Parse not successful and no reason identified
+          # Assuming that parse failed because of a premature end
+          premature_end
+        end
+      end
       private
-      # Raise an exception to indicate a syntax error.
-      def handle_error(aPosition)
-        # Retrieve the actual token
-        actual = tokens[aPosition].terminal
-        lexeme_at_pos = tokens[aPosition].lexeme
+      # Parse error detected: all input tokens were consumed and
+      # the parser didn't detect syntax error meanwhile but
+      # could not reach the accepting state.
+      def premature_end
+        token_pos = tokens.size # One-based!
+        last_token = tokens[-1]
+        entry_set = chart.sets[tokens.size]
+        expected = entry_set.expected_terminals
+        reason = PrematureInputEnd.new(token_pos - 1, last_token, expected)
+        faulty(reason)
+      end
+      # Parse error detected: input token doesn't match
+      # the expectations set by grammar rules
+      def unexpected_token(aPosition)
+        unexpected = tokens[aPosition]
         expected = chart.sets[aPosition].expected_terminals
-        term_names = expected.map(&:name)
-        err_msg = "Syntax error at or near token #{aPosition + 1}"
-        err_msg << ">>>#{lexeme_at_pos}<<<:\nExpected "
-        err_msg << if expected.size > 1
-                     "one of: ['#{term_names.join("', '")}'],"
-                   else
-                     ": #{term_names[0]},"
-                   end
-        err_msg << " found a '#{actual.name}'"
-        raise StandardError, err_msg + ' instead.'
+        reason = UnexpectedToken.new(aPosition, unexpected, expected)
+        faulty(reason)
       end
       def apply_rule(antecedentEntry, aVertex, anOrigin, aPosition, aRuleId)

data/lib/rley/ptree/token_range.rb CHANGED Viewed

@@ -68,11 +68,6 @@ module Rley # This module is used as a namespace
         return "[#{low_text}, #{high_text}]"
       end
-      # Generate a String that represents a value-based identifier
-      def keystr()
-        return "#{low.object_id}-#{high.object_id}"
-      end
       private
       def assign_low(aRange)

data/lib/rley/rley_error.rb ADDED Viewed

@@ -0,0 +1,10 @@
+# File: exceptions.rb
+module Rley # Module used as a namespace
+  # @abstract
+  # Base class for any exception explicitly raised by Rley code.
+  class RleyError < StandardError
+  end # class
+end # module
+# End of file

data/lib/rley/sppf/parse_forest.rb CHANGED Viewed

@@ -4,15 +4,13 @@ require_relative 'alternative_node'
 module Rley # This module is used as a namespace
   module SPPF # This module is used as a namespace
-    # TODO change comment
-    # A parse tree (a.k.a. concrete syntax tree) is a tree-based representation
-    # for the parse that corresponds to the input text. In a parse tree,
-    # a node corresponds to a grammar symbol used during the parsing:
-    # - a leaf node maps to a terminal symbol occurring in
-    # the input, and
-    # - a intermediate node maps to a non-terminal node reduced
-    # during the parse.
-    # The root node corresponds to the main/start symbol of the grammar.
+    # In an ambiguous grammar there are valid inputs that can result in multiple
+    # parse trees. A set of parse trees is commonly referred to as a parse
+    # forest. More specifically a parse forest is a graph data
+    # structure designed to represent a set of equally syntactically correct
+    # parse trees. Parse forests generated by Rley are so-called Shared Packed
+    # Parse Forests (SPPF). SPPFs allow very compact representation of parse
+    # trees by sharing common sub-tree amongst the parse trees.
     class ParseForest
       # The root node of the forest
       attr_reader(:root)

data/spec/rley/parser/error_reason_spec.rb ADDED Viewed

@@ -0,0 +1,120 @@
+require_relative '../../spec_helper'
+require_relative '../../../lib/rley/parser/token'
+# Load the class under test
+require_relative '../../../lib/rley/parser/error_reason'
+module Rley # Open this namespace to avoid module qualifier prefixes
+  module Parser # Open this namespace to avoid module qualifier prefixes
+    describe NoInput do
+      context 'Initialization:' do
+        # Default instantiation rule
+        subject { NoInput.new }
+        it 'should be created without argument' do
+          expect { NoInput.new }.not_to raise_error
+        end
+        it 'should know the error position' do
+          expect(subject.position).to eq(0)
+        end
+      end # context
+      context 'Provided services:' do
+        it 'should emit a standard message' do
+          text = 'Input cannot be empty.'
+          expect(subject.to_s).to eq(text)
+          expect(subject.message).to eq(text)
+        end
+        it 'should give a clear inspection text' do
+          text = 'Rley::Parser::NoInput: Input cannot be empty.'
+          expect(subject.inspect).to eq(text)
+        end
+      end # context
+    end # describe
+    describe ExpectationNotMet do
+      let(:err_token) { double('fake-token') }
+      let(:terminals) do
+        ['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
+      end
+      # Default instantiation rule
+      subject { ExpectationNotMet.new(3, err_token, terminals)  }
+      context 'Initialization:' do
+        it 'should be created with arguments' do
+          expect { ExpectationNotMet.new(3, err_token, terminals) }.not_to raise_error
+        end
+        it 'should know the error position' do
+          expect(subject.position).to eq(3)
+        end
+        it 'should know the expected terminals' do
+          expect(subject.expected_terminals).to eq(terminals)
+        end
+      end # context
+    end # describe
+    describe UnexpectedToken do
+      let(:err_lexeme) { '-'}
+      let(:err_terminal) { Syntax::Terminal.new('MINUS') }
+      let(:err_token) { Token.new(err_lexeme, err_terminal) }
+      let(:terminals) do
+        ['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
+      end
+      # Default instantiation rule
+      subject { UnexpectedToken.new(3, err_token, terminals)  }
+      context 'Initialization:' do
+        it 'should be created with arguments' do
+          expect { UnexpectedToken.new(3, err_token, terminals) }.not_to raise_error
+        end
+      end # context
+      context 'Provided services:' do
+        it 'should emit a message' do
+          text = <<MSG_END
+Syntax error at or near token 4 >>>-<<<
+Expected one of: ['PLUS', 'LPAREN'], found a 'MINUS' instead.
+MSG_END
+          expect(subject.to_s).to eq(text.chomp)
+          expect(subject.message).to eq(text.chomp)
+        end
+      end # context
+    end #describe
+    describe PrematureInputEnd do
+      let(:err_lexeme) { '+'}
+      let(:err_terminal) { Syntax::Terminal.new('PLUS') }
+      let(:err_token) { Token.new(err_lexeme, err_terminal) }
+      let(:terminals) do
+        ['INT', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
+      end
+      # Default instantiation rule
+      subject { PrematureInputEnd.new(3, err_token, terminals)  }
+      context 'Initialization:' do
+        it 'should be created with arguments' do
+          expect { PrematureInputEnd.new(3, err_token, terminals) }.not_to raise_error
+        end
+      end # context
+      context 'Provided services:' do
+        it 'should emit a message' do
+          text = <<MSG_END
+Premature end of input after '+' at position 4
+Expected one of: ['INT', 'LPAREN'].
+MSG_END
+          expect(subject.to_s).to eq(text.chomp)
+          expect(subject.message).to eq(text.chomp)
+        end
+      end # context
+    end # describe
+  end # module
+end # module
+# End of file

data/spec/rley/parser/gfg_chart_spec.rb CHANGED Viewed

@@ -46,17 +46,16 @@ module Rley # Open this namespace to avoid module qualifier prefixes
       # from the abc grammar
       let(:items_from_grammar) { build_items_for_grammar(grammar_abc) }
       let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
-      let(:sample_tracer) { ParseTracer.new(0, output, token_seq) }
       let(:sample_start_symbol) { sample_gfg.start_vertex.non_terminal }
       # Default instantiation rule
-      subject { GFGChart.new(count_token, sample_gfg, sample_tracer) }
+      subject { GFGChart.new(count_token, sample_gfg) }
       context 'Initialization:' do
-        it 'should be created with start vertex, token count, tracer' do
-          expect { GFGChart.new(count_token, sample_gfg, sample_tracer) }
+        it 'should be created with start vertex, token count' do
+          expect { GFGChart.new(count_token, sample_gfg) }
             .not_to raise_error
         end
@@ -64,10 +63,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
           expect(subject.sets.size).to eq(count_token + 1)
         end
-        it 'should reference a tracer' do
-          expect(subject.tracer).to eq(sample_tracer)
-        end
         it 'should know the start symbol' do
           expect(subject.start_symbol).to eq(sample_start_symbol)
         end
@@ -83,52 +78,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
         end
-=end
-      end # context
-      context 'Provided services:' do
-=begin
-        let(:t_a) { Syntax::Terminal.new('a') }
-        let(:t_b) { Syntax::Terminal.new('b') }
-        let(:t_c) { Syntax::Terminal.new('c') }
-        let(:nt_sentence) { Syntax::NonTerminal.new('sentence') }
-        let(:sample_prod) do
-          Syntax::Production.new(nt_sentence, [t_a, t_b, t_c])
-        end
-        let(:origin_val) { 3 }
-        let(:dotted_rule) { DottedItem.new(sample_prod, 2) }
-        let(:complete_rule) { DottedItem.new(sample_prod, 3) }
-        let(:sample_parse_state) { ParseState.new(dotted_rule, origin_val) }
-        let(:sample_tracer) { ParseTracer.new(1, output, token_seq) }
-        # Factory method.
-        def parse_state(origin, aDottedRule)
-          ParseState.new(aDottedRule, origin)
-        end
-        it 'should trace its initialization' do
-          subject[0]  # Force constructor call here
-          expectation = <<-SNIPPET
-['I', 'saw', 'John', 'with', 'a', 'dog']
-|.  I   . saw  . John . with .  a   . dog  .|
-|>      .      .      .      .      .      .| [0:0] sentence => A B . C
-SNIPPET
-          expect(output.string).to eq(expectation)
-        end
-        it 'should trace parse state pushing' do
-          subject[0]  # Force constructor call here
-          output.string = ''
-          subject.push_state(dotted_rule, 3, 5, :prediction)
-          expectation = <<-SNIPPET
-|.      .      .      >      .| [3:5] sentence => A B . C
-SNIPPET
-          expect(output.string).to eq(expectation)
-        end
 =end
       end # context
     end # describe

data/spec/rley/parser/gfg_earley_parser_spec.rb CHANGED Viewed

@@ -7,8 +7,11 @@ require_relative '../../../lib/rley/syntax/grammar_builder'
 require_relative '../../../lib/rley/parser/token'
 require_relative '../../../lib/rley/parser/dotted_item'
 require_relative '../../../lib/rley/parser/gfg_parsing'
+# Load builders and lexers for sample grammars
 require_relative '../support/grammar_abc_helper'
 require_relative '../support/ambiguous_grammar_helper'
+require_relative '../support/grammar_pb_helper'
 require_relative '../support/grammar_helper'
 require_relative '../support/expectation_helper'
@@ -68,10 +71,10 @@ module Rley # Open this namespace to avoid module qualifier prefixes
       # for the language specified by grammar_expr
       def grm2_tokens()
         input_sequence = [
-          { '2' => 'integer' },
-          '+',
+          { '2' => 'integer' },
+          '+',
           { '3' => 'integer' },
-          '*',
+          '*',
           { '4' => 'integer' }
         ]
         return build_token_sequence(input_sequence, grammar_expr)
@@ -178,39 +181,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
           expect(entry_set_5.entries.size).to eq(4)
           compare_entry_texts(entry_set_5, expected)
         end
-=begin
-        it 'should trace a parse with level 1' do
-          # Substitute temporarily $stdout by a StringIO
-          prev_ostream = $stdout
-          $stdout = StringIO.new('', 'w')
-          trace_level = 1
-          subject.parse(grm1_tokens, trace_level)
-          expectations = <<-SNIPPET
-['a', 'a', 'b', 'c', 'c']
-|. a . a . b . c . c .|
-|>   .   .   .   .   .| [0:0] S => . A
-|>   .   .   .   .   .| [0:0] A => . 'a' A 'c'
-|>   .   .   .   .   .| [0:0] A => . 'b'
-|[---]   .   .   .   .| [0:1] A => 'a' . A 'c'
-|.   >   .   .   .   .| [1:1] A => . 'a' A 'c'
-|.   >   .   .   .   .| [1:1] A => . 'b'
-|.   [---]   .   .   .| [1:2] A => 'a' . A 'c'
-|.   .   >   .   .   .| [2:2] A => . 'a' A 'c'
-|.   .   >   .   .   .| [2:2] A => . 'b'
-|.   .   [---]   .   .| [2:3] A => 'b' .
-|.   [------->   .   .| [1:3] A => 'a' A . 'c'
-|.   .   .   [---]   .| [3:4] A => 'a' A 'c' .
-|[--------------->   .| [0:4] A => 'a' A . 'c'
-|.   .   .   .   [---]| [4:5] A => 'a' A 'c' .
-|[===================]| [0:5] S => A .
-SNIPPET
-          expect($stdout.string).to eq(expectations)
-          # Restore standard ouput stream
-          $stdout = prev_ostream
-        end
-=end
         it 'should parse a valid simple expression' do
           instance = GFGEarleyParser.new(grammar_expr)
@@ -586,40 +556,81 @@ SNIPPET
         it 'should parse an invalid simple input' do
           # Parse an erroneous input (b is missing)
           wrong = build_token_sequence(%w(a a c c), grammar_abc)
+          parse_result = subject.parse(wrong)
+          expect(parse_result.success?).to eq(false)
           err_msg = <<-MSG
-Syntax error at or near token 3>>>c<<<:
+Syntax error at or near token 3 >>>c<<<
 Expected one of: ['a', 'b'], found a 'c' instead.
 MSG
-          err = StandardError
-          expect { subject.parse(wrong) }
-            .to raise_error(err, err_msg.chomp)
+          expect(parse_result.failure_reason.message).to eq(err_msg.chomp)
         end
-        it 'should parse a common sample' do
-          # Grammar based on example found in paper of K. Pingali, G. Bilardi:
-          # "A Graphical Model for Context-Free Gammar Parsing"
-          t_int = Syntax::Literal.new('int', /[-+]?\d+/)
-          t_plus = Syntax::VerbatimSymbol.new('+')
-          t_lparen = Syntax::VerbatimSymbol.new('(')
-          t_rparen = Syntax::VerbatimSymbol.new(')')
+        it 'should report error when no input provided but was required' do
+          helper = GrammarPBHelper.new
+          grammar = helper.grammar
+          instance = GFGEarleyParser.new(grammar)
+          tokens = helper.tokenize('')
+          parse_result = instance.parse(tokens)
+          expect(parse_result.success?).to eq(false)
+          err_msg = 'Input cannot be empty.'
+          expect(parse_result.failure_reason.message).to eq(err_msg)
+        end
-          builder = Syntax::GrammarBuilder.new do
-            add_terminals(t_int, t_plus, t_lparen, t_rparen)
-            rule 'S' => 'E'
-            rule 'E' => 'int'
-            rule 'E' => %w(( E + E ))
-            rule 'E' => %w(E + E)
-          end
-          input_sequence = [
-            { '7' => 'int' },
-            '+',
-            { '8' => 'int' },
-            '+',
-            { '9' => 'int' }
+        it 'should report error when input ends prematurely' do
+          helper = GrammarPBHelper.new
+          grammar = helper.grammar
+          instance = GFGEarleyParser.new(grammar)
+          tokens = helper.tokenize('1 +')
+          parse_result = instance.parse(tokens)
+          expect(parse_result.success?).to eq(false)
+          ###################### S(0) == . 1 +
+          # Expectation chart[0]:
+          expected = [
+            '.S | 0',                     # initialization
+            'S => . E | 0',               # start rule
+            '.E | 0',                     # call rule
+            'E => . int | 0',             # start rule
+            "E => . '(' E '+' E ')' | 0", # start rule
+            "E => . E '+' E | 0"          # start rule
           ]
-          tokens = build_token_sequence(input_sequence, builder.grammar)
-          instance = GFGEarleyParser.new(builder.grammar)
+          compare_entry_texts(parse_result.chart[0], expected)
+          ###################### S(1) == 1 . +
+          # Expectation chart[1]:
+          expected = [
+            'E => int . | 0',             # scan '1'
+            'E. | 0',                     # exit rule
+            'S => E . | 0',               # end rule
+            "E => E . '+' E | 0",         # end rule
+            'S. | 0'                      # exit rule
+          ]
+          compare_entry_texts(parse_result.chart[1], expected)
+          ###################### S(2) == 1 + .
+          # Expectation chart[2]:
+          expected = [
+            "E => E '+' . E | 0",         # scan '+'
+            '.E | 2',                     # exit rule
+            'E => . int | 2',             # start rule
+            "E => . '(' E '+' E ')' | 2", # start rule
+            "E => . E '+' E | 2"          # start rule
+          ]
+          compare_entry_texts(parse_result.chart[2], expected)
+          err_msg = "Premature end of input after '+' at position 2"
+          err_msg << "\nExpected one of: ['int', '(']."
+          expect(parse_result.failure_reason.message).to eq(err_msg)
+        end
+        it 'should parse a common sample' do
+          # Use grammar based on example found in paper of
+          # K. Pingali and G. Bilardi:
+          # "A Graphical Model for Context-Free Grammar Parsing"
+          helper = GrammarPBHelper.new
+          grammar = helper.grammar
+          instance = GFGEarleyParser.new(grammar)
+          tokens = helper.tokenize('7 + 8 + 9')
           parse_result = instance.parse(tokens)
           expect(parse_result.success?).to eq(true)
           ###################### S(0) == . 7 + 8 + 9

data/spec/rley/parser/gfg_parsing_spec.rb CHANGED Viewed

@@ -53,16 +53,15 @@ module Rley # Open this namespace to avoid module qualifier prefixes
       let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
       let(:output) { StringIO.new('', 'w') }
-      let(:sample_tracer) { ParseTracer.new(0, output, grm1_tokens) }
       # Default instantiation rule
       subject do
-        GFGParsing.new(sample_gfg, grm1_tokens, sample_tracer)
+        GFGParsing.new(sample_gfg, grm1_tokens)
       end
       context 'Initialization:' do
         it 'should be created with a GFG, tokens, trace' do
-          expect { GFGParsing.new(sample_gfg, grm1_tokens, sample_tracer) }
+          expect { GFGParsing.new(sample_gfg, grm1_tokens) }
             .not_to raise_error
         end

data/spec/rley/support/grammar_pb_helper.rb ADDED Viewed

@@ -0,0 +1,48 @@
+# Load the builder class
+require_relative '../../../lib/rley/syntax/grammar_builder'
+require_relative '../../../lib/rley/parser/token'
+# Utility class.
+class GrammarPBHelper
+  # Factory method. Creates a grammar for a basic arithmetic
+  # expression based on example found in paper of
+  # K. Pingali and G. Bilardi:
+  # "A Graphical Model for Context-Free Grammar Parsing"
+  def grammar()
+    @grammar ||= begin
+      builder = Rley::Syntax::GrammarBuilder.new do
+        t_int = Rley::Syntax::Literal.new('int', /[-+]?\d+/)
+        t_plus = Rley::Syntax::VerbatimSymbol.new('+')
+        t_lparen = Rley::Syntax::VerbatimSymbol.new('(')
+        t_rparen = Rley::Syntax::VerbatimSymbol.new(')')
+        add_terminals(t_int, t_plus, t_lparen, t_rparen)
+        rule 'S' => 'E'
+        rule 'E' => 'int'
+        rule 'E' => %w(( E + E ))
+        rule 'E' => %w(E + E)
+      end
+      builder.grammar
+    end
+  end
+  # Basic expression tokenizer
+  def tokenize(aText)
+    tokens = aText.scan(/\S+/).map do |lexeme|
+      case lexeme
+        when '+', '(', ')'
+          terminal = @grammar.name2symbol[lexeme]
+        when /^[-+]?\d+$/
+          terminal = @grammar.name2symbol['int']
+        else
+          msg = "Unknown input text '#{lexeme}'"
+          raise StandardError, msg
+      end
+      Rley::Parser::Token.new(lexeme, terminal)
+    end
+    return tokens
+  end
+end # module
+# End of file

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rley
 version: !ruby/object:Gem::Version
-  version: 0.3.12
+  version: 0.4.00
 platform: ruby
 authors:
 - Dimitri Geshef
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-12-08 00:00:00.000000000 Z
+date: 2016-12-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -161,6 +161,7 @@ files:
 - lib/rley/parser/chart.rb
 - lib/rley/parser/dotted_item.rb
 - lib/rley/parser/earley_parser.rb
+- lib/rley/parser/error_reason.rb
 - lib/rley/parser/gfg_chart.rb
 - lib/rley/parser/gfg_earley_parser.rb
 - lib/rley/parser/gfg_parsing.rb
@@ -183,6 +184,7 @@ files:
 - lib/rley/ptree/parse_tree_node.rb
 - lib/rley/ptree/terminal_node.rb
 - lib/rley/ptree/token_range.rb
+- lib/rley/rley_error.rb
 - lib/rley/sppf/alternative_node.rb
 - lib/rley/sppf/composite_node.rb
 - lib/rley/sppf/epsilon_node.rb
@@ -220,6 +222,7 @@ files:
 - spec/rley/parser/chart_spec.rb
 - spec/rley/parser/dotted_item_spec.rb
 - spec/rley/parser/earley_parser_spec.rb
+- spec/rley/parser/error_reason_spec.rb
 - spec/rley/parser/gfg_chart_spec.rb
 - spec/rley/parser/gfg_earley_parser_spec.rb
 - spec/rley/parser/gfg_parsing_spec.rb
@@ -250,6 +253,7 @@ files:
 - spec/rley/support/grammar_b_expr_helper.rb
 - spec/rley/support/grammar_helper.rb
 - spec/rley/support/grammar_l0_helper.rb
+- spec/rley/support/grammar_pb_helper.rb
 - spec/rley/support/grammar_sppf_helper.rb
 - spec/rley/syntax/grammar_builder_spec.rb
 - spec/rley/syntax/grammar_spec.rb
@@ -308,6 +312,7 @@ test_files:
 - spec/rley/parser/chart_spec.rb
 - spec/rley/parser/dotted_item_spec.rb
 - spec/rley/parser/earley_parser_spec.rb
+- spec/rley/parser/error_reason_spec.rb
 - spec/rley/parser/gfg_chart_spec.rb
 - spec/rley/parser/gfg_earley_parser_spec.rb
 - spec/rley/parser/gfg_parsing_spec.rb