rley 0.3.08 → 0.3.09

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 38c77acd3a3831519aa9511c02ab2aaa7f99c962
4
- data.tar.gz: bd84fe47221a72d195a92e40e8c5a62fafed562b
3
+ metadata.gz: d22d11b51c4c72d3d8230c775b797afae04c4e8f
4
+ data.tar.gz: e87ba3a3beeadd40a4447281c904f8dba4c8dc57
5
5
  SHA512:
6
- metadata.gz: 7a36a172143025e47b03022dc2db7be6bcb990b21834ea51288afe9158c3fe97eaafce70cbcd36ea64841da03036a0e98e446454b4e6f8e2170e60e9fa2c0697
7
- data.tar.gz: 716aa238f7f40e07118b11eb3d7c38c5a3f2cc8583c516818d1d5bcb11267a7c52b7849e049de4ecf93729f23746b1569adfbdd1d297947e787f9c82a5a0b94e
6
+ metadata.gz: 446344010aafea29682d90bd71bdb73f1f189e694fb91907136e00cce015b098e32fb2679af3a0b6ae772d41892478b0a9270c43ebe77cee2c47cb319b7fd0aa
7
+ data.tar.gz: 9509ea071e10c002e70af379089565b3b28269fb09c1f1d04ee9a04ec78dab7ac916e4f90602bbef5bc3eff8ea0952143faa5ec1d6c54277b60e7db810dab45d
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ ### 0.3.09 / 2016-11-27
2
+ * [CHANGE] File `README.md` fully rewritten and added an example.
3
+ * [CHANGE] Directory `examples` completely reorganized.
4
+
1
5
  ### 0.3.08 / 2016-11-17
2
6
  * [FIX] Method `ParseWalkerFactory#select_antecedent` did not support alternative nodes creation when visiting an item entry for highly ambiguous parse.
3
7
  * [FIX] Method `ParseWalkerFactory#select_antecedent` did not manage properly call/return stack for alternative nodes created when visiting an item entry for highly ambiguous parse.
data/README.md CHANGED
@@ -1,8 +1,3 @@
1
- Rley
2
- ===========
3
- [Homepage](https://github.com/famished-tiger/Rley)
4
-
5
-
6
1
  [![Build Status](https://travis-ci.org/famished-tiger/Rley.svg?branch=master)](https://travis-ci.org/famished-tiger/Rley)
7
2
  [![Coverage Status](https://img.shields.io/coveralls/famished-tiger/Rley.svg)](https://coveralls.io/r/famished-tiger/Rley?branch=master)
8
3
  [![Gem Version](https://badge.fury.io/rb/rley.svg)](http://badge.fury.io/rb/rley)
@@ -10,36 +5,190 @@ Rley
10
5
  [![Inline docs](http://inch-ci.org/github/famished-tiger/Rley.svg?branch=master)](http://inch-ci.org/github/famished-tiger/Rley)
11
6
  [![License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/famished-tiger/Rley/blob/master/LICENSE.txt)
12
7
 
13
- __Rley__ is a Ruby implementation of a parser using the [Earley](http://en.wikipedia.org/wiki/Earley_parser) algorithm.
14
- The project aims to build a parser convenient for lightweight NLP (Natural Language Processing) purposes.
15
8
 
16
- ### Highlights ###
17
- * Handles any context-free language,
9
+ [Rley](https://github.com/famished-tiger/Rley)
10
+ ======
11
+
12
+ A Ruby library for constructing general parsers for _any_ context-free languages.
13
+
14
+
15
+ What is Rley?
16
+ -------------
17
+ __Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)
18
+ algorithm which is a general parsing algorithm that can handle any context-free
19
+ grammar. Earley parsers can literally swallow anything that can be described
20
+ by such a context-free grammar. That's why Earley parsers find their place in so
21
+ many __NLP__ (_Natural Language Processing_) libraries/toolkits.
22
+
23
+ In addition, __Rley__ goes beyond most Earley parser implementations by providing
24
+ support for ambiguous parses. Indeed, it delivers the results of a parse as a
25
+ _Shared Packed Parse Forest_ (SPPF). A SPPF is a data structure that allows to
26
+ encode efficiently all the possible parse trees that result from an ambiguous
27
+ grammar.
28
+
29
+ As another distinctive mark, __Rley__ is also the first Ruby implementation of a
30
+ parsing library based on the new _Grammar Flow Graph_ approach (_TODO: add details_).
31
+
32
+ ### What it can do?
33
+ Maybe parsing algorithms and internal implementation details are of lesser
34
+ interest to you and the good question to ask is "what Rley can really do?".
35
+
36
+ In a nutshell:
37
+ * Rley can parse context-free languages that other well-known libraries cannot
38
+ handle
39
+ * Built-in support for ambiguous grammars that typically occur in NLP
40
+
41
+ In short, the foundations of Rley are strong enough to be useful in a large
42
+ application range such as:
43
+ * computer languages,
44
+ * artificial intelligence and
45
+ * Natural Language Processing.
46
+
47
+ #### Features
48
+ * Simple API for context-free grammar definition,
49
+ * Allows ambiguous grammars,
50
+ * Generates shared packed parse forests,
18
51
  * Accepts left-recursive rules/productions,
19
- * Accepts ambiguous grammars,
20
- * Parse tracing facility,
21
- * Parse tree generation,
22
- * Syntax error detection and reporting.
23
-
24
-
25
- ### Yet another parser? ###
26
- Yes and no. Rley doesn't aim to replace other very good programming language parsers for Ruby.
27
- The latter are faster because they use optimized algorithms at the price of a loss of generality
28
- in the grammar/language they support.
29
- The Earley's algorithm being more general is able to parse input that conforms to any context-free grammar.
30
-
31
- This project is in "earley" stage.
32
- ####Roadmap:
33
- - Rewrite the parser using the GFG (Grammar Flow Graph) approach
34
- - Replace parse trees by shared packed parse forests
35
- - Document the parser API
36
- - Add more validation tests and sample grammars
37
- - Add a command-line interface
38
- - Provide documentation and examples
39
-
40
-
41
- ### Other similar Ruby projects ###
42
- __Rley__ isn't the sole Ruby implementation of the Earley parser algorithm.
52
+ * Provides syntax error detection and reporting.
53
+
54
+ ---
55
+
56
+ Getting Started
57
+ ---------------
58
+
59
+ ### Installation
60
+ Installing the latest stable version is simple:
61
+
62
+ $ gem install rley
63
+
64
+
65
+ ## A whirlwind tour of Rley
66
+ The purpose of this section is show how to create a parser for a minimalistic
67
+ English language subset.
68
+ The tour is organized into the following steps:
69
+ 1. [Defining the language grammar](#defining-the-language-grammar)
70
+ 2. [Creating a lexicon](#creating-a-lexicon)
71
+ 3. [Creating a tokenizer](#creating-a-tokenizer)
72
+ 4. [Building the parser](building-the-parser)
73
+ 5. [Parsing some input](#parsing-some-input)
74
+ 6. [Generating the parse forest](#generating-the-parse-forest)
75
+
76
+ The complete source code of the tour can be found in the
77
+ [examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
78
+ directory
79
+
80
+ ### Defining the language grammar
81
+ The subset of English grammar is based on an example from the NLTK book.
82
+
83
+ ```ruby
84
+ require 'rley' # Load Rley library
85
+
86
+ # Instantiate a builder object that will build the grammar for us
87
+ builder = Rley::Syntax::GrammarBuilder.new
88
+
89
+ # Next 2 lines we define the terminal symbols (=word categories in the lexicon)
90
+ builder.add_terminals('Noun', 'Proper-Noun', 'Verb')
91
+ builder.add_terminals('Determiner', 'Preposition')
92
+
93
+ # Here we define the productions (= grammar rules)
94
+ builder.add_production('S' => %w[NP VP])
95
+ builder.add_production('NP' => 'Proper-Noun')
96
+ builder.add_production('NP' => %w[Determiner Noun])
97
+ builder.add_production('NP' => %w[Determiner Noun PP])
98
+ builder.add_production('VP' => %w[Verb NP])
99
+ builder.add_production('VP' => %w[Verb NP PP])
100
+ builder.add_production('PP' => %w[Preposition NP])
101
+
102
+ # And now, let's build the grammar...
103
+ grammar = builder.grammar
104
+ ```
105
+
106
+ ## Creating a lexicon
107
+
108
+ ```ruby
109
+ # To simplify things, lexicon is implemented as a Hash with pairs of the form:
110
+ # word => terminal symbol name
111
+ Lexicon = {
112
+ 'man' => 'Noun',
113
+ 'dog' => 'Noun',
114
+ 'cat' => 'Noun',
115
+ 'telescope' => 'Noun',
116
+ 'park' => 'Noun',
117
+ 'saw' => 'Verb',
118
+ 'ate' => 'Verb',
119
+ 'walked' => 'Verb',
120
+ 'John' => 'Proper-Noun',
121
+ 'Mary' => 'Proper-Noun',
122
+ 'Bob' => 'Proper-Noun',
123
+ 'a' => 'Determiner',
124
+ 'an' => 'Determiner',
125
+ 'the' => 'Determiner',
126
+ 'my' => 'Determiner',
127
+ 'in' => 'Preposition',
128
+ 'on' => 'Preposition',
129
+ 'by' => 'Preposition',
130
+ 'with' => 'Preposition'
131
+ }
132
+ ```
133
+
134
+
135
+ ## Creating a tokenizer
136
+ ```ruby
137
+ # A tokenizer reads the input string and converts it into a sequence of tokens
138
+ # Highly simplified tokenizer implementation.
139
+ def tokenizer(aText, aGrammar)
140
+ tokens = aText.scan(/\S+/).map do |word|
141
+ term_name = Lexicon[word]
142
+ if term_name.nil?
143
+ raise StandardError, "Word '#{word}' not found in lexicon"
144
+ end
145
+ terminal = aGrammar.name2symbol[term_name]
146
+ Rley::Parser::Token.new(word, terminal)
147
+ end
148
+
149
+ return tokens
150
+ end
151
+ ```
152
+
153
+ More ambitious NLP applications will surely rely on a Part-of-Speech tagger instead of
154
+ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speech gems:
155
+ * [engtagger](https://rubygems.org/gems/engtagger)
156
+ * [rbtagger](https://rubygems.org/gems/rbtagger)
157
+
158
+
159
+
160
+ ## Building the parser
161
+ ```ruby
162
+ # Easy with Rley...
163
+ parser = Rley::Parser::GFGEarleyParser.new(grammar)
164
+ ```
165
+
166
+
167
+ ## Parsing some input
168
+ ```ruby
169
+ input_to_parse = 'John saw Mary with a telescope'
170
+ # Convert input text into a sequence of token objects...
171
+ tokens = tokenizer(input_to_parse, grammar)
172
+ result = parser.parse(tokens)
173
+
174
+ puts "Parsing successful? #{result.success?}" # => Parsing successful? true
175
+ ```
176
+
177
+ ## Generating the parse forest
178
+ ```ruby
179
+ pforest = result.parse_forest
180
+ ```
181
+
182
+
183
+
184
+ ## Examples
185
+
186
+ The project source directory contains several example scripts that demonstrate
187
+ how grammars are to be constructed and used.
188
+
189
+
190
+ ## Other similar Ruby projects
191
+ __Rley__ isn't the sole implementation of the Earley parser algorithm in Ruby.
43
192
  Here are a few other ones:
44
193
  - [Kanocc gem](https://rubygems.org/gems/kanocc) -- Advertised as a Ruby based parsing and translation framework.
45
194
  Although the gem dates from 2009, the author still maintains its in a public repository in [Github](https://github.com/surlykke/Kanocc)
@@ -51,11 +200,17 @@ Here are a few other ones:
51
200
  [earley project](https://github.com/joshingly/earley) -- An Earley parser (grammar rules are specified in JSON format).
52
201
  The code doesn't seem to be maintained: latest commit dates from Nov. 2011.
53
202
  - [linguist project](https://github.com/davidkellis/linguist) -- Advertised as library for parsing context-free languages.
54
- It is a recognizer not a parser. In other words it can only tell whether a given input
203
+ It is a recognizer not a parser. In other words it can only tell whether a given input
55
204
  conforms to the grammar rules or not. As such it cannot build parse trees.
56
205
  The code doesn't seem to be maintained: latest commit dates from Oct. 2011.
57
206
 
207
+
208
+ ## Thanks to:
209
+ * Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchanges.
210
+
211
+ ---
212
+
58
213
  Copyright
59
214
  ---------
60
- Copyright (c) 2014-2016, Dimitri Geshef.
215
+ Copyright (c) 2014-2016, Dimitri Geshef.
61
216
  __Rley__ is released under the MIT License see [LICENSE.txt](https://github.com/famished-tiger/Rley/blob/master/LICENSE.txt) for details.
@@ -0,0 +1,92 @@
1
+ require 'rley' # Load Rley library
2
+
3
+ ########################################
4
+ # Step 1. Define a grammar for a micro English-like language
5
+ # based on example from NLTK book (chapter 8 of the book).
6
+ # Bird, Steven, Edward Loper and Ewan Klein: "Natural Language Processing
7
+ # with Python"; 2009, O’Reilly Media Inc., ISBN 978-0596516499
8
+ # It defines the syntax of a sentence in a mini English-like language
9
+ # with a very simplified syntax.
10
+
11
+ # Instantiate a builder object that will build the grammar for us
12
+ builder = Rley::Syntax::GrammarBuilder.new
13
+
14
+ # Next 2 lines we define the terminal symbols (=word categories in the lexicon)
15
+ builder.add_terminals('Noun', 'Proper-Noun', 'Verb')
16
+ builder.add_terminals('Determiner', 'Preposition')
17
+
18
+ # Here we define the productions (= grammar rules)
19
+ builder.add_production('S' => %w[NP VP])
20
+ builder.add_production('NP' => 'Proper-Noun')
21
+ builder.add_production('NP' => %w[Determiner Noun])
22
+ builder.add_production('NP' => %w[Determiner Noun PP])
23
+ builder.add_production('VP' => %w[Verb NP])
24
+ builder.add_production('VP' => %w[Verb NP PP])
25
+ builder.add_production('PP' => %w[Preposition NP])
26
+
27
+ # And now, let's build the grammar...
28
+ grammar = builder.grammar
29
+
30
+ ########################################
31
+ # Step 2. Creating a lexicon
32
+ # To simplify things, lexicon is implemented as a Hash with pairs of the form:
33
+ # word => terminal symbol name
34
+ Lexicon = {
35
+ 'man' => 'Noun',
36
+ 'dog' => 'Noun',
37
+ 'cat' => 'Noun',
38
+ 'telescope' => 'Noun',
39
+ 'park' => 'Noun',
40
+ 'saw' => 'Verb',
41
+ 'ate' => 'Verb',
42
+ 'walked' => 'Verb',
43
+ 'John' => 'Proper-Noun',
44
+ 'Mary' => 'Proper-Noun',
45
+ 'Bob' => 'Proper-Noun',
46
+ 'a' => 'Determiner',
47
+ 'an' => 'Determiner',
48
+ 'the' => 'Determiner',
49
+ 'my' => 'Determiner',
50
+ 'in' => 'Preposition',
51
+ 'on' => 'Preposition',
52
+ 'by' => 'Preposition',
53
+ 'with' => 'Preposition'
54
+ }
55
+
56
+ ########################################
57
+ # Step 3. Creating a tokenizer
58
+ # A tokenizer reads the input string and converts it into a sequence of tokens
59
+ # Highly simplified tokenizer implementation.
60
+ def tokenizer(aTextToParse, aGrammar)
61
+ tokens = aTextToParse.scan(/\S+/).map do |word|
62
+ term_name = Lexicon[word]
63
+ if term_name.nil?
64
+ raise StandardError, "Word '#{word}' not found in lexicon"
65
+ end
66
+ terminal = aGrammar.name2symbol[term_name]
67
+ Rley::Parser::Token.new(word, terminal)
68
+ end
69
+
70
+ return tokens
71
+ end
72
+
73
+ More realistic NLP will will most probably
74
+
75
+ ########################################
76
+ # Step 4. Create a parser for that grammar
77
+ # Easy with Rley...
78
+ parser = Rley::Parser::GFGEarleyParser.new(grammar)
79
+
80
+ ########################################
81
+ # Step 5. Parsing the input
82
+ input_to_parse = 'John saw Mary with a telescope'
83
+ # Convert input text into a sequence of token objects...
84
+ tokens = tokenizer(input_to_parse, grammar)
85
+ result = parser.parse(tokens)
86
+
87
+ puts "Parsing successful? #{result.success?}" # => Parsing successful? true
88
+
89
+ ########################################
90
+ # Step 6. Generating the parse forest
91
+ pforest = result.parse_forest
92
+
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Rley # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.3.08'.freeze
6
+ Version = '0.3.09'.freeze
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = "Ruby implementation of the Earley's parsing algorithm".freeze
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rley
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.08
4
+ version: 0.3.09
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-11-17 00:00:00.000000000 Z
11
+ date: 2016-11-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -129,24 +129,7 @@ files:
129
129
  - LICENSE.txt
130
130
  - README.md
131
131
  - Rakefile
132
- - examples/grammars/grammar_L0.rb
133
- - examples/grammars/grammar_abc.rb
134
- - examples/parsers/demo-JSON/JSON_grammar.rb
135
- - examples/parsers/demo-JSON/JSON_lexer.rb
136
- - examples/parsers/demo-JSON/JSON_parser.rb
137
- - examples/parsers/demo-JSON/demo_json.rb
138
- - examples/parsers/parsing_L0.rb
139
- - examples/parsers/parsing_L1.rb
140
- - examples/parsers/parsing_abc.rb
141
- - examples/parsers/parsing_ambig.rb
142
- - examples/parsers/parsing_another.rb
143
- - examples/parsers/parsing_b_expr.rb
144
- - examples/parsers/parsing_err_expr.rb
145
- - examples/parsers/parsing_groucho.rb
146
- - examples/parsers/parsing_right_recursive.rb
147
- - examples/parsers/parsing_tricky.rb
148
- - examples/parsers/tracing_parser.rb
149
- - examples/recognizers/recognizer_abc.rb
132
+ - examples/NLP/mini_en_demo.rb
150
133
  - lib/rley.rb
151
134
  - lib/rley/constants.rb
152
135
  - lib/rley/formatter/base_formatter.rb
@@ -1,32 +0,0 @@
1
- # Purpose: to demonstrate how to build a very simple grammar
2
- require 'rley' # Load the gem
3
-
4
- # Sample grammar for a very limited English language
5
- # based on the language L0 from Jurafsky & Martin
6
-
7
- # Let's create the grammar step-by-step with the grammar builder:
8
- builder = Rley::Syntax::GrammarBuilder.new
9
-
10
- # Enumerate the POS Part-Of-Speech...
11
- builder.add_terminals('Noun', 'Verb', 'Adjective')
12
- builder.add_terminals('Pronoun', 'Proper-Noun', 'Determiner')
13
- builder.add_terminals('Preposition', 'Conjunction')
14
-
15
- # Now the production rules...
16
- builder.add_production('S'=> ['NP', 'VP']) # e.g. I + want a morning flight
17
- builder.add_production('NP' => 'Pronoun') # e.g. I
18
- builder.add_production('NP' => 'Proper-Noun') # e.g. Los Angeles
19
- builder.add_production('NP' => ['Determiner', 'Nominal']) # e.g. a + flight
20
- builder.add_production('Nominal' => %w(Nominal Noun)) # morning + flight
21
- builder.add_production('Nominal' => 'Noun') # e.g. flights
22
- builder.add_production('VP' => 'Verb') # e.g. do
23
- builder.add_production('VP' => ['Verb', 'NP']) # e.g. want + a flight
24
- builder.add_production('VP' => ['Verb', 'NP', 'PP'])
25
- builder.add_production('VP' => ['Verb', 'PP']) # leaving + on Thursday
26
- builder.add_production('PP' => ['Preposition', 'NP']) # from + Los Angeles
27
-
28
- # And now we 're ready to build the grammar...
29
- grammar_L0 = builder.grammar
30
-
31
- # Prove that it is a grammar
32
- puts grammar_L0.class.name