RubyGems - rley - Versions diffs - 0.3.08 → 0.3.09 - Mend

rley 0.3.08 → 0.3.09

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +4 -0
data/README.md +190 -35
data/examples/NLP/mini_en_demo.rb +92 -0
data/lib/rley/constants.rb +1 -1
metadata +3 -20
data/examples/grammars/grammar_L0.rb +0 -32
data/examples/grammars/grammar_abc.rb +0 -26
data/examples/parsers/demo-JSON/JSON_grammar.rb +0 -31
data/examples/parsers/demo-JSON/JSON_lexer.rb +0 -114
data/examples/parsers/demo-JSON/JSON_parser.rb +0 -89
data/examples/parsers/demo-JSON/demo_json.rb +0 -42
data/examples/parsers/parsing_L0.rb +0 -124
data/examples/parsers/parsing_L1.rb +0 -137
data/examples/parsers/parsing_abc.rb +0 -71
data/examples/parsers/parsing_ambig.rb +0 -92
data/examples/parsers/parsing_another.rb +0 -70
data/examples/parsers/parsing_b_expr.rb +0 -85
data/examples/parsers/parsing_err_expr.rb +0 -74
data/examples/parsers/parsing_groucho.rb +0 -97
data/examples/parsers/parsing_right_recursive.rb +0 -70
data/examples/parsers/parsing_tricky.rb +0 -91
data/examples/parsers/tracing_parser.rb +0 -54
data/examples/recognizers/recognizer_abc.rb +0 -71

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 38c77acd3a3831519aa9511c02ab2aaa7f99c962
-  data.tar.gz: bd84fe47221a72d195a92e40e8c5a62fafed562b
+  metadata.gz: d22d11b51c4c72d3d8230c775b797afae04c4e8f
+  data.tar.gz: e87ba3a3beeadd40a4447281c904f8dba4c8dc57
 SHA512:
-  metadata.gz: 7a36a172143025e47b03022dc2db7be6bcb990b21834ea51288afe9158c3fe97eaafce70cbcd36ea64841da03036a0e98e446454b4e6f8e2170e60e9fa2c0697
-  data.tar.gz: 716aa238f7f40e07118b11eb3d7c38c5a3f2cc8583c516818d1d5bcb11267a7c52b7849e049de4ecf93729f23746b1569adfbdd1d297947e787f9c82a5a0b94e
+  metadata.gz: 446344010aafea29682d90bd71bdb73f1f189e694fb91907136e00cce015b098e32fb2679af3a0b6ae772d41892478b0a9270c43ebe77cee2c47cb319b7fd0aa
+  data.tar.gz: 9509ea071e10c002e70af379089565b3b28269fb09c1f1d04ee9a04ec78dab7ac916e4f90602bbef5bc3eff8ea0952143faa5ec1d6c54277b60e7db810dab45d

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,7 @@
+### 0.3.09 / 2016-11-27
+* [CHANGE] File `README.md` fully rewritten and added an example.
+* [CHANGE] Directory `examples` completely reorganized.
 ### 0.3.08 / 2016-11-17
 * [FIX] Method `ParseWalkerFactory#select_antecedent` did not support alternative nodes creation when visiting an item entry for highly ambiguous parse.
 * [FIX] Method `ParseWalkerFactory#select_antecedent` did not manage properly call/return stack for alternative nodes created when visiting an item entry for highly ambiguous parse.

data/README.md CHANGED Viewed

@@ -1,8 +1,3 @@
-Rley
-===========
-[Homepage](https://github.com/famished-tiger/Rley)
 [![Build Status](https://travis-ci.org/famished-tiger/Rley.svg?branch=master)](https://travis-ci.org/famished-tiger/Rley)
 [![Coverage Status](https://img.shields.io/coveralls/famished-tiger/Rley.svg)](https://coveralls.io/r/famished-tiger/Rley?branch=master)
 [![Gem Version](https://badge.fury.io/rb/rley.svg)](http://badge.fury.io/rb/rley)
@@ -10,36 +5,190 @@ Rley
 [![Inline docs](http://inch-ci.org/github/famished-tiger/Rley.svg?branch=master)](http://inch-ci.org/github/famished-tiger/Rley)
 [![License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/famished-tiger/Rley/blob/master/LICENSE.txt)
-__Rley__ is a Ruby implementation of a parser using the [Earley](http://en.wikipedia.org/wiki/Earley_parser) algorithm.
-The project aims to build a parser convenient for lightweight NLP (Natural Language Processing) purposes.
-### Highlights ###
-* Handles any context-free language,
+[Rley](https://github.com/famished-tiger/Rley)
+======
+A Ruby library for constructing general parsers for _any_ context-free languages.
+What is Rley?
+-------------
+__Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)
+algorithm which is a general parsing algorithm that can handle any context-free
+grammar. Earley parsers can literally swallow anything that can be described
+by such a context-free grammar. That's why Earley parsers find their place in so
+many __NLP__ (_Natural Language Processing_) libraries/toolkits.
+In addition, __Rley__ goes beyond most Earley parser implementations by providing
+support for ambiguous parses. Indeed, it delivers the results of a parse as a
+_Shared Packed Parse Forest_ (SPPF). A SPPF is a data structure that allows to
+encode efficiently all the possible parse trees that result from an ambiguous
+grammar.
+As another distinctive mark, __Rley__ is also the first Ruby implementation of a
+parsing library based on the new _Grammar Flow Graph_ approach (_TODO: add details_).
+### What it can do?
+Maybe parsing algorithms and internal implementation details are of lesser
+interest to you and the good question to ask is "what Rley can really do?".
+In a nutshell:
+* Rley can parse context-free languages that other well-known libraries cannot
+handle
+* Built-in support for ambiguous grammars that typically occur in NLP
+In short, the foundations of Rley are strong enough to be useful in a large
+application range such as:
+* computer languages,
+* artificial intelligence and
+* Natural Language Processing.
+#### Features
+* Simple API for context-free grammar definition,
+* Allows ambiguous grammars,
+* Generates shared packed parse forests,
 * Accepts left-recursive rules/productions,
-* Accepts ambiguous grammars,
-* Parse tracing facility,
-* Parse tree generation,
-* Syntax error detection and reporting.
-### Yet another parser? ###
-Yes and no. Rley doesn't aim to replace other very good programming language parsers for Ruby.
-The latter are faster because they use optimized algorithms at the price of a loss of generality
-in the grammar/language they support.
-The Earley's algorithm being more general is able to parse input that conforms to any context-free grammar.
-This project is in "earley" stage.
-####Roadmap:
-- Rewrite the parser using the GFG (Grammar Flow Graph) approach
-- Replace parse trees by shared packed parse forests
-- Document the parser API
-- Add more validation tests and sample grammars
-- Add a command-line interface
-- Provide documentation and examples
-### Other similar Ruby projects ###
-__Rley__ isn't the sole Ruby implementation of the Earley parser algorithm.
+* Provides syntax error detection and reporting.
+---
+Getting Started
+---------------
+### Installation
+Installing the latest stable version is simple:
+    $ gem install rley
+## A whirlwind tour of Rley
+The purpose of this section is show how to create a parser for a minimalistic
+English language subset.
+The tour is organized into the following steps:
+1. [Defining the language grammar](#defining-the-language-grammar)
+2. [Creating a lexicon](#creating-a-lexicon)
+3. [Creating a tokenizer](#creating-a-tokenizer)
+4. [Building the parser](building-the-parser)
+5. [Parsing some input](#parsing-some-input)
+6. [Generating the parse forest](#generating-the-parse-forest)
+The complete source code of the tour can be found in the
+[examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
+directory
+### Defining the language grammar
+The subset of English grammar is based on an example from the NLTK book.
+```ruby
+    require 'rley'  # Load Rley library
+    # Instantiate a builder object that will build the grammar for us
+    builder = Rley::Syntax::GrammarBuilder.new
+    # Next 2 lines we define the terminal symbols (=word categories in the lexicon)
+    builder.add_terminals('Noun', 'Proper-Noun', 'Verb')
+    builder.add_terminals('Determiner', 'Preposition')
+    # Here we define the productions (= grammar rules)
+    builder.add_production('S' => %w[NP VP])
+    builder.add_production('NP' => 'Proper-Noun')
+    builder.add_production('NP' => %w[Determiner Noun])
+    builder.add_production('NP' => %w[Determiner Noun PP])
+    builder.add_production('VP' => %w[Verb NP])
+    builder.add_production('VP' => %w[Verb NP PP])
+    builder.add_production('PP' => %w[Preposition NP])
+    # And now, let's build the grammar...
+    grammar = builder.grammar
+```
+## Creating a lexicon
+```ruby
+    # To simplify things, lexicon is implemented as a Hash with pairs of the form:
+    # word => terminal symbol name
+    Lexicon = {
+      'man' => 'Noun',
+      'dog' => 'Noun',
+      'cat' => 'Noun',
+      'telescope' => 'Noun',
+      'park' => 'Noun',
+      'saw' => 'Verb',
+      'ate' => 'Verb',
+      'walked' => 'Verb',
+      'John' => 'Proper-Noun',
+      'Mary' => 'Proper-Noun',
+      'Bob' => 'Proper-Noun',
+      'a' => 'Determiner',
+      'an' => 'Determiner',
+      'the' => 'Determiner',
+      'my' => 'Determiner',
+      'in' => 'Preposition',
+      'on' => 'Preposition',
+      'by' => 'Preposition',
+      'with' => 'Preposition'
+    }
+```
+## Creating a tokenizer
+```ruby
+    # A tokenizer reads the input string and converts it into a sequence of tokens
+    # Highly simplified tokenizer implementation.
+    def tokenizer(aText, aGrammar)
+      tokens = aText.scan(/\S+/).map do |word|
+        term_name = Lexicon[word]
+        if term_name.nil?
+          raise StandardError, "Word '#{word}' not found in lexicon"
+        end
+        terminal = aGrammar.name2symbol[term_name]
+        Rley::Parser::Token.new(word, terminal)
+      end
+      return tokens
+    end
+```
+More ambitious NLP applications will surely rely on a Part-of-Speech tagger instead of
+creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speech gems:
+* [engtagger](https://rubygems.org/gems/engtagger)
+* [rbtagger](https://rubygems.org/gems/rbtagger)
+## Building the parser
+```ruby
+  # Easy with Rley...
+  parser = Rley::Parser::GFGEarleyParser.new(grammar)
+```
+## Parsing some input
+```ruby
+    input_to_parse = 'John saw Mary with a telescope'
+    # Convert input text into a sequence of token objects...
+    tokens = tokenizer(input_to_parse, grammar)
+    result = parser.parse(tokens)
+    puts "Parsing successful? #{result.success?}" # => Parsing successful? true
+```
+## Generating the parse forest
+```ruby
+    pforest = result.parse_forest
+```
+## Examples
+The project source directory contains several example scripts that demonstrate
+how grammars are to be constructed and used.
+## Other similar Ruby projects
+__Rley__ isn't the sole implementation of the Earley parser algorithm in Ruby.
 Here are a few other ones:
 - [Kanocc gem](https://rubygems.org/gems/kanocc) -- Advertised as a Ruby based parsing and translation framework.
   Although the gem dates from 2009, the author still maintains its in a public repository in [Github](https://github.com/surlykke/Kanocc)
@@ -51,11 +200,17 @@ Here are a few other ones:
   [earley project](https://github.com/joshingly/earley) -- An Earley parser (grammar rules are specified in JSON format).
   The code doesn't seem to be maintained: latest commit dates from Nov. 2011.
 - [linguist project](https://github.com/davidkellis/linguist) -- Advertised as library for parsing context-free languages.
-  It is a recognizer not a parser. In other words it can only tell whether a given input
+  It is a recognizer not a parser. In other words it can only tell whether a given input
   conforms to the grammar rules or not. As such it cannot build parse trees.
   The code doesn't seem to be maintained: latest commit dates from Oct. 2011.
+##  Thanks to:
+* Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchanges.
+---
 Copyright
 ---------
-Copyright (c) 2014-2016, Dimitri Geshef.
+Copyright (c) 2014-2016, Dimitri Geshef.
 __Rley__ is released under the MIT License see [LICENSE.txt](https://github.com/famished-tiger/Rley/blob/master/LICENSE.txt) for details.

data/examples/NLP/mini_en_demo.rb ADDED Viewed

@@ -0,0 +1,92 @@
+require 'rley'  # Load Rley library
+########################################
+# Step 1. Define a grammar for a micro English-like language
+# based on example from NLTK book (chapter 8 of the book).
+# Bird, Steven, Edward Loper and Ewan Klein: "Natural Language Processing
+# with Python"; 2009, O’Reilly Media Inc., ISBN 978-0596516499
+# It defines the syntax of a sentence in a mini English-like language
+# with a very simplified syntax.
+# Instantiate a builder object that will build the grammar for us
+builder = Rley::Syntax::GrammarBuilder.new
+# Next 2 lines we define the terminal symbols (=word categories in the lexicon)
+builder.add_terminals('Noun', 'Proper-Noun', 'Verb')
+builder.add_terminals('Determiner', 'Preposition')
+# Here we define the productions (= grammar rules)
+builder.add_production('S' => %w[NP VP])
+builder.add_production('NP' => 'Proper-Noun')
+builder.add_production('NP' => %w[Determiner Noun])
+builder.add_production('NP' => %w[Determiner Noun PP])
+builder.add_production('VP' => %w[Verb NP])
+builder.add_production('VP' => %w[Verb NP PP])
+builder.add_production('PP' => %w[Preposition NP])
+# And now, let's build the grammar...
+grammar = builder.grammar
+########################################
+# Step 2. Creating a lexicon
+# To simplify things, lexicon is implemented as a Hash with pairs of the form:
+# word => terminal symbol name
+Lexicon = {
+  'man' => 'Noun',
+  'dog' => 'Noun',
+  'cat' => 'Noun',
+  'telescope' => 'Noun',
+  'park' => 'Noun',
+  'saw' => 'Verb',
+  'ate' => 'Verb',
+  'walked' => 'Verb',
+  'John' => 'Proper-Noun',
+  'Mary' => 'Proper-Noun',
+  'Bob' => 'Proper-Noun',
+  'a' => 'Determiner',
+  'an' => 'Determiner',
+  'the' => 'Determiner',
+  'my' => 'Determiner',
+  'in' => 'Preposition',
+  'on' => 'Preposition',
+  'by' => 'Preposition',
+  'with' => 'Preposition'
+}
+########################################
+# Step 3. Creating a tokenizer
+# A tokenizer reads the input string and converts it into a sequence of tokens
+# Highly simplified tokenizer implementation.
+def tokenizer(aTextToParse, aGrammar)
+  tokens = aTextToParse.scan(/\S+/).map do |word|
+    term_name = Lexicon[word]
+    if term_name.nil?
+      raise StandardError, "Word '#{word}' not found in lexicon"
+    end
+    terminal = aGrammar.name2symbol[term_name]
+    Rley::Parser::Token.new(word, terminal)
+  end
+  return tokens
+end
+More realistic NLP will will most probably
+########################################
+# Step 4. Create a parser for that grammar
+# Easy with Rley...
+parser = Rley::Parser::GFGEarleyParser.new(grammar)
+########################################
+# Step 5. Parsing the input
+input_to_parse = 'John saw Mary with a telescope'
+# Convert input text into a sequence of token objects...
+tokens = tokenizer(input_to_parse, grammar)
+result = parser.parse(tokens)
+puts "Parsing successful? #{result.success?}" # => Parsing successful? true
+########################################
+# Step 6. Generating the parse forest
+pforest = result.parse_forest

data/lib/rley/constants.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module Rley # Module used as a namespace
   # The version number of the gem.
-  Version = '0.3.08'.freeze
+  Version = '0.3.09'.freeze
   # Brief description of the gem.
   Description = "Ruby implementation of the Earley's parsing algorithm".freeze

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rley
 version: !ruby/object:Gem::Version
-  version: 0.3.08
+  version: 0.3.09
 platform: ruby
 authors:
 - Dimitri Geshef
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-11-17 00:00:00.000000000 Z
+date: 2016-11-27 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -129,24 +129,7 @@ files:
 - LICENSE.txt
 - README.md
 - Rakefile
-- examples/grammars/grammar_L0.rb
-- examples/grammars/grammar_abc.rb
-- examples/parsers/demo-JSON/JSON_grammar.rb
-- examples/parsers/demo-JSON/JSON_lexer.rb
-- examples/parsers/demo-JSON/JSON_parser.rb
-- examples/parsers/demo-JSON/demo_json.rb
-- examples/parsers/parsing_L0.rb
-- examples/parsers/parsing_L1.rb
-- examples/parsers/parsing_abc.rb
-- examples/parsers/parsing_ambig.rb
-- examples/parsers/parsing_another.rb
-- examples/parsers/parsing_b_expr.rb
-- examples/parsers/parsing_err_expr.rb
-- examples/parsers/parsing_groucho.rb
-- examples/parsers/parsing_right_recursive.rb
-- examples/parsers/parsing_tricky.rb
-- examples/parsers/tracing_parser.rb
-- examples/recognizers/recognizer_abc.rb
+- examples/NLP/mini_en_demo.rb
 - lib/rley.rb
 - lib/rley/constants.rb
 - lib/rley/formatter/base_formatter.rb

data/examples/grammars/grammar_L0.rb DELETED Viewed

@@ -1,32 +0,0 @@
-# Purpose: to demonstrate how to build a very simple grammar
-require 'rley'  # Load the gem
-# Sample grammar for a very limited English language
-# based on the language L0 from Jurafsky & Martin
-# Let's create the grammar step-by-step with the grammar builder:
-builder = Rley::Syntax::GrammarBuilder.new
-# Enumerate the POS Part-Of-Speech...
-builder.add_terminals('Noun', 'Verb', 'Adjective')
-builder.add_terminals('Pronoun', 'Proper-Noun', 'Determiner')
-builder.add_terminals('Preposition', 'Conjunction')
-# Now the production rules...
-builder.add_production('S'=> ['NP', 'VP']) # e.g. I + want a morning flight
-builder.add_production('NP' => 'Pronoun')  # e.g. I
-builder.add_production('NP' => 'Proper-Noun') # e.g. Los Angeles
-builder.add_production('NP' => ['Determiner', 'Nominal'])  # e.g. a + flight
-builder.add_production('Nominal' => %w(Nominal Noun)) # morning + flight
-builder.add_production('Nominal' => 'Noun') # e.g. flights
-builder.add_production('VP' => 'Verb')      # e.g. do
-builder.add_production('VP' => ['Verb', 'NP'])  # e.g. want + a flight
-builder.add_production('VP' => ['Verb', 'NP', 'PP'])
-builder.add_production('VP' => ['Verb', 'PP']) # leaving + on Thursday
-builder.add_production('PP' => ['Preposition', 'NP']) # from + Los Angeles
-# And now we 're ready to build the grammar...
-grammar_L0 = builder.grammar
-# Prove that it is a grammar
-puts grammar_L0.class.name