rley 0.3.08 → 0.3.09

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 38c77acd3a3831519aa9511c02ab2aaa7f99c962
4
- data.tar.gz: bd84fe47221a72d195a92e40e8c5a62fafed562b
3
+ metadata.gz: d22d11b51c4c72d3d8230c775b797afae04c4e8f
4
+ data.tar.gz: e87ba3a3beeadd40a4447281c904f8dba4c8dc57
5
5
  SHA512:
6
- metadata.gz: 7a36a172143025e47b03022dc2db7be6bcb990b21834ea51288afe9158c3fe97eaafce70cbcd36ea64841da03036a0e98e446454b4e6f8e2170e60e9fa2c0697
7
- data.tar.gz: 716aa238f7f40e07118b11eb3d7c38c5a3f2cc8583c516818d1d5bcb11267a7c52b7849e049de4ecf93729f23746b1569adfbdd1d297947e787f9c82a5a0b94e
6
+ metadata.gz: 446344010aafea29682d90bd71bdb73f1f189e694fb91907136e00cce015b098e32fb2679af3a0b6ae772d41892478b0a9270c43ebe77cee2c47cb319b7fd0aa
7
+ data.tar.gz: 9509ea071e10c002e70af379089565b3b28269fb09c1f1d04ee9a04ec78dab7ac916e4f90602bbef5bc3eff8ea0952143faa5ec1d6c54277b60e7db810dab45d
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ ### 0.3.09 / 2016-11-27
2
+ * [CHANGE] File `README.md` fully rewritten and added an example.
3
+ * [CHANGE] Directory `examples` completely reorganized.
4
+
1
5
  ### 0.3.08 / 2016-11-17
2
6
  * [FIX] Method `ParseWalkerFactory#select_antecedent` did not support alternative nodes creation when visiting an item entry for highly ambiguous parse.
3
7
  * [FIX] Method `ParseWalkerFactory#select_antecedent` did not manage properly call/return stack for alternative nodes created when visiting an item entry for highly ambiguous parse.
data/README.md CHANGED
@@ -1,8 +1,3 @@
1
- Rley
2
- ===========
3
- [Homepage](https://github.com/famished-tiger/Rley)
4
-
5
-
6
1
  [![Build Status](https://travis-ci.org/famished-tiger/Rley.svg?branch=master)](https://travis-ci.org/famished-tiger/Rley)
7
2
  [![Coverage Status](https://img.shields.io/coveralls/famished-tiger/Rley.svg)](https://coveralls.io/r/famished-tiger/Rley?branch=master)
8
3
  [![Gem Version](https://badge.fury.io/rb/rley.svg)](http://badge.fury.io/rb/rley)
@@ -10,36 +5,190 @@ Rley
10
5
  [![Inline docs](http://inch-ci.org/github/famished-tiger/Rley.svg?branch=master)](http://inch-ci.org/github/famished-tiger/Rley)
11
6
  [![License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/famished-tiger/Rley/blob/master/LICENSE.txt)
12
7
 
13
- __Rley__ is a Ruby implementation of a parser using the [Earley](http://en.wikipedia.org/wiki/Earley_parser) algorithm.
14
- The project aims to build a parser convenient for lightweight NLP (Natural Language Processing) purposes.
15
8
 
16
- ### Highlights ###
17
- * Handles any context-free language,
9
+ [Rley](https://github.com/famished-tiger/Rley)
10
+ ======
11
+
12
+ A Ruby library for constructing general parsers for _any_ context-free languages.
13
+
14
+
15
+ What is Rley?
16
+ -------------
17
+ __Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)
18
+ algorithm which is a general parsing algorithm that can handle any context-free
19
+ grammar. Earley parsers can literally swallow anything that can be described
20
+ by such a context-free grammar. That's why Earley parsers find their place in so
21
+ many __NLP__ (_Natural Language Processing_) libraries/toolkits.
22
+
23
+ In addition, __Rley__ goes beyond most Earley parser implementations by providing
24
+ support for ambiguous parses. Indeed, it delivers the results of a parse as a
25
+ _Shared Packed Parse Forest_ (SPPF). A SPPF is a data structure that allows to
26
+ encode efficiently all the possible parse trees that result from an ambiguous
27
+ grammar.
28
+
29
+ As another distinctive mark, __Rley__ is also the first Ruby implementation of a
30
+ parsing library based on the new _Grammar Flow Graph_ approach (_TODO: add details_).
31
+
32
+ ### What it can do?
33
+ Maybe parsing algorithms and internal implementation details are of lesser
34
+ interest to you and the good question to ask is "what Rley can really do?".
35
+
36
+ In a nutshell:
37
+ * Rley can parse context-free languages that other well-known libraries cannot
38
+ handle
39
+ * Built-in support for ambiguous grammars that typically occur in NLP
40
+
41
+ In short, the foundations of Rley are strong enough to be useful in a large
42
+ application range such as:
43
+ * computer languages,
44
+ * artificial intelligence and
45
+ * Natural Language Processing.
46
+
47
+ #### Features
48
+ * Simple API for context-free grammar definition,
49
+ * Allows ambiguous grammars,
50
+ * Generates shared packed parse forests,
18
51
  * Accepts left-recursive rules/productions,
19
- * Accepts ambiguous grammars,
20
- * Parse tracing facility,
21
- * Parse tree generation,
22
- * Syntax error detection and reporting.
23
-
24
-
25
- ### Yet another parser? ###
26
- Yes and no. Rley doesn't aim to replace other very good programming language parsers for Ruby.
27
- The latter are faster because they use optimized algorithms at the price of a loss of generality
28
- in the grammar/language they support.
29
- The Earley's algorithm being more general is able to parse input that conforms to any context-free grammar.
30
-
31
- This project is in "earley" stage.
32
- ####Roadmap:
33
- - Rewrite the parser using the GFG (Grammar Flow Graph) approach
34
- - Replace parse trees by shared packed parse forests
35
- - Document the parser API
36
- - Add more validation tests and sample grammars
37
- - Add a command-line interface
38
- - Provide documentation and examples
39
-
40
-
41
- ### Other similar Ruby projects ###
42
- __Rley__ isn't the sole Ruby implementation of the Earley parser algorithm.
52
+ * Provides syntax error detection and reporting.
53
+
54
+ ---
55
+
56
+ Getting Started
57
+ ---------------
58
+
59
+ ### Installation
60
+ Installing the latest stable version is simple:
61
+
62
+ $ gem install rley
63
+
64
+
65
+ ## A whirlwind tour of Rley
66
+ The purpose of this section is show how to create a parser for a minimalistic
67
+ English language subset.
68
+ The tour is organized into the following steps:
69
+ 1. [Defining the language grammar](#defining-the-language-grammar)
70
+ 2. [Creating a lexicon](#creating-a-lexicon)
71
+ 3. [Creating a tokenizer](#creating-a-tokenizer)
72
+ 4. [Building the parser](building-the-parser)
73
+ 5. [Parsing some input](#parsing-some-input)
74
+ 6. [Generating the parse forest](#generating-the-parse-forest)
75
+
76
+ The complete source code of the tour can be found in the
77
+ [examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
78
+ directory
79
+
80
+ ### Defining the language grammar
81
+ The subset of English grammar is based on an example from the NLTK book.
82
+
83
+ ```ruby
84
+ require 'rley' # Load Rley library
85
+
86
+ # Instantiate a builder object that will build the grammar for us
87
+ builder = Rley::Syntax::GrammarBuilder.new
88
+
89
+ # Next 2 lines we define the terminal symbols (=word categories in the lexicon)
90
+ builder.add_terminals('Noun', 'Proper-Noun', 'Verb')
91
+ builder.add_terminals('Determiner', 'Preposition')
92
+
93
+ # Here we define the productions (= grammar rules)
94
+ builder.add_production('S' => %w[NP VP])
95
+ builder.add_production('NP' => 'Proper-Noun')
96
+ builder.add_production('NP' => %w[Determiner Noun])
97
+ builder.add_production('NP' => %w[Determiner Noun PP])
98
+ builder.add_production('VP' => %w[Verb NP])
99
+ builder.add_production('VP' => %w[Verb NP PP])
100
+ builder.add_production('PP' => %w[Preposition NP])
101
+
102
+ # And now, let's build the grammar...
103
+ grammar = builder.grammar
104
+ ```
105
+
106
+ ## Creating a lexicon
107
+
108
+ ```ruby
109
+ # To simplify things, lexicon is implemented as a Hash with pairs of the form:
110
+ # word => terminal symbol name
111
+ Lexicon = {
112
+ 'man' => 'Noun',
113
+ 'dog' => 'Noun',
114
+ 'cat' => 'Noun',
115
+ 'telescope' => 'Noun',
116
+ 'park' => 'Noun',
117
+ 'saw' => 'Verb',
118
+ 'ate' => 'Verb',
119
+ 'walked' => 'Verb',
120
+ 'John' => 'Proper-Noun',
121
+ 'Mary' => 'Proper-Noun',
122
+ 'Bob' => 'Proper-Noun',
123
+ 'a' => 'Determiner',
124
+ 'an' => 'Determiner',
125
+ 'the' => 'Determiner',
126
+ 'my' => 'Determiner',
127
+ 'in' => 'Preposition',
128
+ 'on' => 'Preposition',
129
+ 'by' => 'Preposition',
130
+ 'with' => 'Preposition'
131
+ }
132
+ ```
133
+
134
+
135
+ ## Creating a tokenizer
136
+ ```ruby
137
+ # A tokenizer reads the input string and converts it into a sequence of tokens
138
+ # Highly simplified tokenizer implementation.
139
+ def tokenizer(aText, aGrammar)
140
+ tokens = aText.scan(/\S+/).map do |word|
141
+ term_name = Lexicon[word]
142
+ if term_name.nil?
143
+ raise StandardError, "Word '#{word}' not found in lexicon"
144
+ end
145
+ terminal = aGrammar.name2symbol[term_name]
146
+ Rley::Parser::Token.new(word, terminal)
147
+ end
148
+
149
+ return tokens
150
+ end
151
+ ```
152
+
153
+ More ambitious NLP applications will surely rely on a Part-of-Speech tagger instead of
154
+ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speech gems:
155
+ * [engtagger](https://rubygems.org/gems/engtagger)
156
+ * [rbtagger](https://rubygems.org/gems/rbtagger)
157
+
158
+
159
+
160
+ ## Building the parser
161
+ ```ruby
162
+ # Easy with Rley...
163
+ parser = Rley::Parser::GFGEarleyParser.new(grammar)
164
+ ```
165
+
166
+
167
+ ## Parsing some input
168
+ ```ruby
169
+ input_to_parse = 'John saw Mary with a telescope'
170
+ # Convert input text into a sequence of token objects...
171
+ tokens = tokenizer(input_to_parse, grammar)
172
+ result = parser.parse(tokens)
173
+
174
+ puts "Parsing successful? #{result.success?}" # => Parsing successful? true
175
+ ```
176
+
177
+ ## Generating the parse forest
178
+ ```ruby
179
+ pforest = result.parse_forest
180
+ ```
181
+
182
+
183
+
184
+ ## Examples
185
+
186
+ The project source directory contains several example scripts that demonstrate
187
+ how grammars are to be constructed and used.
188
+
189
+
190
+ ## Other similar Ruby projects
191
+ __Rley__ isn't the sole implementation of the Earley parser algorithm in Ruby.
43
192
  Here are a few other ones:
44
193
  - [Kanocc gem](https://rubygems.org/gems/kanocc) -- Advertised as a Ruby based parsing and translation framework.
45
194
  Although the gem dates from 2009, the author still maintains its in a public repository in [Github](https://github.com/surlykke/Kanocc)
@@ -51,11 +200,17 @@ Here are a few other ones:
51
200
  [earley project](https://github.com/joshingly/earley) -- An Earley parser (grammar rules are specified in JSON format).
52
201
  The code doesn't seem to be maintained: latest commit dates from Nov. 2011.
53
202
  - [linguist project](https://github.com/davidkellis/linguist) -- Advertised as library for parsing context-free languages.
54
- It is a recognizer not a parser. In other words it can only tell whether a given input
203
+ It is a recognizer not a parser. In other words it can only tell whether a given input
55
204
  conforms to the grammar rules or not. As such it cannot build parse trees.
56
205
  The code doesn't seem to be maintained: latest commit dates from Oct. 2011.
57
206
 
207
+
208
+ ## Thanks to:
209
+ * Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchanges.
210
+
211
+ ---
212
+
58
213
  Copyright
59
214
  ---------
60
- Copyright (c) 2014-2016, Dimitri Geshef.
215
+ Copyright (c) 2014-2016, Dimitri Geshef.
61
216
  __Rley__ is released under the MIT License see [LICENSE.txt](https://github.com/famished-tiger/Rley/blob/master/LICENSE.txt) for details.
@@ -0,0 +1,92 @@
1
+ require 'rley' # Load Rley library
2
+
3
+ ########################################
4
+ # Step 1. Define a grammar for a micro English-like language
5
+ # based on example from NLTK book (chapter 8 of the book).
6
+ # Bird, Steven, Edward Loper and Ewan Klein: "Natural Language Processing
7
+ # with Python"; 2009, O’Reilly Media Inc., ISBN 978-0596516499
8
+ # It defines the syntax of a sentence in a mini English-like language
9
+ # with a very simplified syntax.
10
+
11
+ # Instantiate a builder object that will build the grammar for us
12
+ builder = Rley::Syntax::GrammarBuilder.new
13
+
14
+ # Next 2 lines we define the terminal symbols (=word categories in the lexicon)
15
+ builder.add_terminals('Noun', 'Proper-Noun', 'Verb')
16
+ builder.add_terminals('Determiner', 'Preposition')
17
+
18
+ # Here we define the productions (= grammar rules)
19
+ builder.add_production('S' => %w[NP VP])
20
+ builder.add_production('NP' => 'Proper-Noun')
21
+ builder.add_production('NP' => %w[Determiner Noun])
22
+ builder.add_production('NP' => %w[Determiner Noun PP])
23
+ builder.add_production('VP' => %w[Verb NP])
24
+ builder.add_production('VP' => %w[Verb NP PP])
25
+ builder.add_production('PP' => %w[Preposition NP])
26
+
27
+ # And now, let's build the grammar...
28
+ grammar = builder.grammar
29
+
30
+ ########################################
31
+ # Step 2. Creating a lexicon
32
+ # To simplify things, lexicon is implemented as a Hash with pairs of the form:
33
+ # word => terminal symbol name
34
+ Lexicon = {
35
+ 'man' => 'Noun',
36
+ 'dog' => 'Noun',
37
+ 'cat' => 'Noun',
38
+ 'telescope' => 'Noun',
39
+ 'park' => 'Noun',
40
+ 'saw' => 'Verb',
41
+ 'ate' => 'Verb',
42
+ 'walked' => 'Verb',
43
+ 'John' => 'Proper-Noun',
44
+ 'Mary' => 'Proper-Noun',
45
+ 'Bob' => 'Proper-Noun',
46
+ 'a' => 'Determiner',
47
+ 'an' => 'Determiner',
48
+ 'the' => 'Determiner',
49
+ 'my' => 'Determiner',
50
+ 'in' => 'Preposition',
51
+ 'on' => 'Preposition',
52
+ 'by' => 'Preposition',
53
+ 'with' => 'Preposition'
54
+ }
55
+
56
+ ########################################
57
+ # Step 3. Creating a tokenizer
58
+ # A tokenizer reads the input string and converts it into a sequence of tokens
59
+ # Highly simplified tokenizer implementation.
60
+ def tokenizer(aTextToParse, aGrammar)
61
+ tokens = aTextToParse.scan(/\S+/).map do |word|
62
+ term_name = Lexicon[word]
63
+ if term_name.nil?
64
+ raise StandardError, "Word '#{word}' not found in lexicon"
65
+ end
66
+ terminal = aGrammar.name2symbol[term_name]
67
+ Rley::Parser::Token.new(word, terminal)
68
+ end
69
+
70
+ return tokens
71
+ end
72
+
73
+ More realistic NLP will will most probably
74
+
75
+ ########################################
76
+ # Step 4. Create a parser for that grammar
77
+ # Easy with Rley...
78
+ parser = Rley::Parser::GFGEarleyParser.new(grammar)
79
+
80
+ ########################################
81
+ # Step 5. Parsing the input
82
+ input_to_parse = 'John saw Mary with a telescope'
83
+ # Convert input text into a sequence of token objects...
84
+ tokens = tokenizer(input_to_parse, grammar)
85
+ result = parser.parse(tokens)
86
+
87
+ puts "Parsing successful? #{result.success?}" # => Parsing successful? true
88
+
89
+ ########################################
90
+ # Step 6. Generating the parse forest
91
+ pforest = result.parse_forest
92
+
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Rley # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.3.08'.freeze
6
+ Version = '0.3.09'.freeze
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = "Ruby implementation of the Earley's parsing algorithm".freeze
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rley
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.08
4
+ version: 0.3.09
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-11-17 00:00:00.000000000 Z
11
+ date: 2016-11-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -129,24 +129,7 @@ files:
129
129
  - LICENSE.txt
130
130
  - README.md
131
131
  - Rakefile
132
- - examples/grammars/grammar_L0.rb
133
- - examples/grammars/grammar_abc.rb
134
- - examples/parsers/demo-JSON/JSON_grammar.rb
135
- - examples/parsers/demo-JSON/JSON_lexer.rb
136
- - examples/parsers/demo-JSON/JSON_parser.rb
137
- - examples/parsers/demo-JSON/demo_json.rb
138
- - examples/parsers/parsing_L0.rb
139
- - examples/parsers/parsing_L1.rb
140
- - examples/parsers/parsing_abc.rb
141
- - examples/parsers/parsing_ambig.rb
142
- - examples/parsers/parsing_another.rb
143
- - examples/parsers/parsing_b_expr.rb
144
- - examples/parsers/parsing_err_expr.rb
145
- - examples/parsers/parsing_groucho.rb
146
- - examples/parsers/parsing_right_recursive.rb
147
- - examples/parsers/parsing_tricky.rb
148
- - examples/parsers/tracing_parser.rb
149
- - examples/recognizers/recognizer_abc.rb
132
+ - examples/NLP/mini_en_demo.rb
150
133
  - lib/rley.rb
151
134
  - lib/rley/constants.rb
152
135
  - lib/rley/formatter/base_formatter.rb
@@ -1,32 +0,0 @@
1
- # Purpose: to demonstrate how to build a very simple grammar
2
- require 'rley' # Load the gem
3
-
4
- # Sample grammar for a very limited English language
5
- # based on the language L0 from Jurafsky & Martin
6
-
7
- # Let's create the grammar step-by-step with the grammar builder:
8
- builder = Rley::Syntax::GrammarBuilder.new
9
-
10
- # Enumerate the POS Part-Of-Speech...
11
- builder.add_terminals('Noun', 'Verb', 'Adjective')
12
- builder.add_terminals('Pronoun', 'Proper-Noun', 'Determiner')
13
- builder.add_terminals('Preposition', 'Conjunction')
14
-
15
- # Now the production rules...
16
- builder.add_production('S'=> ['NP', 'VP']) # e.g. I + want a morning flight
17
- builder.add_production('NP' => 'Pronoun') # e.g. I
18
- builder.add_production('NP' => 'Proper-Noun') # e.g. Los Angeles
19
- builder.add_production('NP' => ['Determiner', 'Nominal']) # e.g. a + flight
20
- builder.add_production('Nominal' => %w(Nominal Noun)) # morning + flight
21
- builder.add_production('Nominal' => 'Noun') # e.g. flights
22
- builder.add_production('VP' => 'Verb') # e.g. do
23
- builder.add_production('VP' => ['Verb', 'NP']) # e.g. want + a flight
24
- builder.add_production('VP' => ['Verb', 'NP', 'PP'])
25
- builder.add_production('VP' => ['Verb', 'PP']) # leaving + on Thursday
26
- builder.add_production('PP' => ['Preposition', 'NP']) # from + Los Angeles
27
-
28
- # And now we 're ready to build the grammar...
29
- grammar_L0 = builder.grammar
30
-
31
- # Prove that it is a grammar
32
- puts grammar_L0.class.name