rley 0.4.04 → 0.4.05

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5100723b47c2572c2ac0b5996413eb4cd8e462ba
4
- data.tar.gz: 4d37be9ca7128a36f77505560c7214454e413eb5
3
+ metadata.gz: 89fbee176719b82e8e1b1676ceb89b837e9ad249
4
+ data.tar.gz: ff72f2224e98479729b441aa0a38b0da61fd34e4
5
5
  SHA512:
6
- metadata.gz: c197b1b962408e3a3ce4be5d310c227bdc638028d9a40d26b89c544232f59a8ce994228e4ae3673cbe6147dbcd520ee53c36a0620e3c7871edd3aba46762933f
7
- data.tar.gz: 21b59e0b9b5c942be83954b35ee15650e1bc2c2aa2f9a6f1ebba645b94e3f822df510026825354512cf7eea879431feed852f34cef9fd176c5d0b89377504a46
6
+ metadata.gz: a8eed1075570fac6f89f6ed16164c52026426dd0c497030fd31bbfd861e7873744be3acc14afa046da40f67ddf9d4987301dac76227ac4d0f504e40f46caa862
7
+ data.tar.gz: 684e37a2a236bc9e34e4bacba4775b61608be5dab58b3e4af414da222cf0c79f3db98631b904969c23b3ed882368ea553c7c25f9099cd09f17d5d5105111ba21
data/CHANGELOG.md CHANGED
@@ -1,3 +1,10 @@
1
+ ### 0.4.05 / 2017-05-06
2
+ * [CHANGE] File `README.md` Added documentation on how to build parse trees and manipulate them.
3
+ * [CHANGE] File `examples\NLP\mini_en_demo.rb` now emits different parse tree representations.
4
+ * [NEW] Directory `www`. Contains a diagram output produced from Rley and fed to online RSyntaxTree tool.
5
+
6
+
7
+
1
8
  ### 0.4.04 / 2017-05-01
2
9
  * [NEW] `Asciitree` formatter class. Allows parse tree output in simple printable text.
3
10
  * [CHANGE] Major enhancements in directory `examples\data_formats\JSON`. The demo command-line tool parses JSON and outputs the parse tree in one of the supported formats.
data/README.md CHANGED
@@ -1,5 +1,5 @@
1
1
  [Rley](https://github.com/famished-tiger/Rley)
2
-
2
+ ====
3
3
  [![Linux Build Status](https://img.shields.io/travis/famished-tiger/Rley/master.svg?label=Linux%20build)](https://travis-ci.org/famished-tiger/Rley)
4
4
  [![Build status](https://ci.appveyor.com/api/projects/status/l5adgcbfo128rvo9?svg=true)](https://ci.appveyor.com/project/famished-tiger/rley)
5
5
  [![Coverage Status](https://img.shields.io/coveralls/famished-tiger/Rley.svg)](https://coveralls.io/r/famished-tiger/Rley?branch=master)
@@ -10,9 +10,6 @@
10
10
 
11
11
  A Ruby library for constructing general parsers for _any_ context-free language.
12
12
 
13
- ====
14
-
15
-
16
13
  What is Rley?
17
14
  -------------
18
15
  __Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)
@@ -28,7 +25,7 @@ encode efficiently all the possible parse trees that result from an ambiguous
28
25
  grammar.
29
26
 
30
27
  As another distinctive mark, __Rley__ is also the first Ruby implementation of a
31
- parsing library based on the new _Grammar Flow Graph_ approach [References on GFG](#references-on-gfg).
28
+ parsing library based on the new [Grammar Flow Graph](#grammar-flow-graph) approach .
32
29
 
33
30
  ### What it can do?
34
31
  Maybe parsing algorithms and internal implementation details are of lesser
@@ -81,7 +78,7 @@ The tour is organized as follows:
81
78
  3. [Creating a tokenizer](#creating-a-tokenizer)
82
79
  4. [Building the parser](#building-the-parser)
83
80
  5. [Parsing some input](#parsing-some-input)
84
- 6. [Generating the parse forest](#generating-the-parse-forest)
81
+ 6. [Generating the parse tree](#generating-the-parse-tree)
85
82
 
86
83
  The complete source code of the example used in this tour can be found in the
87
84
  [examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
@@ -183,11 +180,173 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
183
180
  puts "Parsing successful? #{result.success?}" # => Parsing successful? true
184
181
  ```
185
182
 
186
- ## Generating the parse forest
183
+ At this stage, we're done with parsing. What we need next are convenient means
184
+ to exploit the parse result. As it is, the `result` variable in the last code snippet
185
+ above is a data structure ("Earley item sets") that is tightly related to the intricate details
186
+ of the Earley's parsing algorithm. Obviously, it contains all the necessary data to exploit
187
+ the parsing results but it is too low-level and inconvenient from a programming viewpoint.
188
+ Therefore, __Rley__ provides out of the box two convenient data structures for
189
+ representing the parse outcome:
190
+ - Parse tree (optimal when the parse is unambiguous)
191
+ - Parse forest (a more sophisticated data structure that copes with ambiguity)
192
+
193
+ For our whirlwind tour, we will opt for parse trees.
194
+
195
+ ## Generating the parse tree
196
+
197
+ ```ruby
198
+ ptree = result.parse_tree
199
+ ```
200
+ OK. Now that we have the parse tree, what we can do with it?
201
+ One option is to manipulate the parse tree and its node directly. For instance,
202
+ one could write code to customize and transform the parse tree. This approach gives
203
+ most the of flexibility needed for advanced applications. The other, more common
204
+ option is to rely on the parse tree visitor instantiated from the `Rley::ParseTreeVisitor` class.
205
+ Such a visitor walks over the parse tree nodes and generates visit events that
206
+ are dispatched to subscribed event listeners. All this may, at first, sound
207
+ complicate but the coming code snippets will show it otherwise.
208
+
209
+ Let's do it by:
210
+ - Creating a parse tree visitor
211
+ - Using one of the built-in visit subscribers specifically created to render the
212
+ parse tree in a given output format.
213
+
214
+ ### Creating a parse tree visitor
215
+ Good news: creating a parse tree visitor for the parse tree `ptree` is just
216
+ an one-liner:
217
+
218
+ ```ruby
219
+ # Let's create a parse tree visitor
220
+ visitor = Rley::ParseTreeVisitor.new(ptree)
221
+ ```
222
+
223
+ ### Visiting the parse tree
224
+
225
+ Unsurprisingly, to start the parse tree visit, one calls the `#start` method:
226
+
227
+ ```ruby
228
+ visitor.start
229
+ ```
230
+
231
+ If you try the above line, no particular result will be visible and for a good reason:
232
+ no object was specified as a visit event subscriber. As a convenience, __Rley__
233
+ bundles a number of [formatter classes](https://github.com/famished-tiger/Rley/tree/master/lib/rley/formatter)
234
+ that were designed to listen to the visit event and then render the parse tree
235
+ in a specific format. To begin with, we'll use the simple formatter
236
+ `Rley::Formatter::Debug` class. Its purpose is just to print out the visit event
237
+ name.
238
+
239
+ Remove the line with the call to the `#start` method and replace it with the two
240
+ statements:
241
+ ```ruby
242
+ # Let's create a formatter (i.e. visit event listener)
243
+ renderer = Rley::Formatter::Debug.new($stdout)
244
+
245
+ # Subscribe the formatter to the visitor's event and launch the visit
246
+ renderer.render(visitor)
247
+ ```
248
+
249
+ These two lines will generate the following output:
250
+ ```
251
+ before_ptree
252
+ before_non_terminal
253
+ before_subnodes
254
+ before_non_terminal
255
+ before_subnodes
256
+ before_terminal
257
+ after_terminal
258
+ after_subnodes
259
+ after_non_terminal
260
+ before_non_terminal
261
+ before_subnodes
262
+ before_terminal
263
+ after_terminal
264
+ before_non_terminal
265
+ before_subnodes
266
+ before_terminal
267
+ after_terminal
268
+ after_subnodes
269
+ after_non_terminal
270
+ before_non_terminal
271
+ before_subnodes
272
+ before_terminal
273
+ after_terminal
274
+ before_non_terminal
275
+ before_subnodes
276
+ before_terminal
277
+ after_terminal
278
+ before_terminal
279
+ after_terminal
280
+ after_subnodes
281
+ after_non_terminal
282
+ after_subnodes
283
+ after_non_terminal
284
+ after_subnodes
285
+ after_non_terminal
286
+ after_subnodes
287
+ after_non_terminal
288
+ after_ptree
289
+ ```
290
+
291
+ At least is something visible: these are the parse tree visit events.
292
+ Note that the indentation of event names depends on the nesting level of
293
+ the tree node being visited.
294
+
295
+ Not really impressive? So let's use another formatter...
296
+
297
+ ### Visualizing the parse tree structure
298
+ If one replaces the previous formatter by an instance of
299
+ `Rley::Formatter::Asciitree` the output now shows the parse tree structure.
300
+
187
301
  ```ruby
188
- pforest = result.parse_forest
302
+ # Let's create a formatter that will render the parse tree with characters
303
+ renderer = Rley::Formatter::Asciitree.new($stdout)
304
+
305
+ # Subscribe the formatter to the visitor's event and launch the visit
306
+ renderer.render(visitor)
307
+ ```
308
+
309
+ The outputs looks like this:
310
+ ```
311
+ S
312
+ +-- NP
313
+ | +-- Proper-Noun: 'John'
314
+ +-- VP
315
+ +-- Verb: 'saw'
316
+ +-- NP
317
+ | +-- Proper-Noun: 'Mary'
318
+ +-- PP
319
+ +-- Preposition: 'with'
320
+ +-- NP
321
+ +-- Determiner: 'a'
322
+ +-- Noun: 'telescope'
189
323
  ```
190
324
 
325
+ If you are more inclined for graphical representation, then replace the last formatter
326
+ by yet another one:
327
+
328
+ ```ruby
329
+ # Let's create a formatter that will render the parse tree in labelled bracket notation
330
+ renderer = Rley::Formatter::BracketNotation .new($stdout)
331
+
332
+ # Subscribe the formatter to the visitor's event and launch the visit
333
+ renderer.render(visitor)
334
+ ```
335
+
336
+ This results in the strange-looking output:
337
+ ```
338
+ [S [NP [Proper-Noun John]][VP [Verb saw][NP [Proper-Noun Mary]][PP [Preposition with][NP [Determiner a][Noun telescope]]]]]
339
+ ```
340
+
341
+ This output is in a format that is recognized by many NLP software.
342
+ The next diagram was created by copy-pasting the output above in the online tool
343
+ [RSyntaxTree](http://yohasebe.com/rsyntaxtree/).
344
+ By the way, this tool is also a Ruby gem, [rsyntaxtree](https://rubygems.org/gems/rsyntaxtree).
345
+
346
+ ![Sample parse tree diagram](www/sample_parse_tree.png)
347
+
348
+
349
+
191
350
  ## Error reporting
192
351
  __Rley__ is a non-violent parser, that is, it won't throw an exception when it
193
352
  detects a syntax error. Instead, the parse result will be marked as
@@ -277,12 +436,15 @@ Here are a few other ones:
277
436
  conforms to the grammar rules or not. As such it cannot build parse trees.
278
437
  The code doesn't seem to be maintained: latest commit dates from Oct. 2011.
279
438
 
439
+ ## Other interesting Ruby resources
440
+ The extensive resource list not to miss: [Awesome NLP with Ruby](https://github.com/arbox/nlp-with-ruby)
441
+ actively curated by Andrei Beliankou (aka arbox).
280
442
 
281
443
  ## Thanks to:
282
444
  * Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchanges.
283
445
 
284
- ## References on GFG
285
- Since the __G__rammar __F__low __G__raph parsing approach is quite new, it has not yet taken a place in
446
+ ## Grammar Flow Graph
447
+ Since the Grammar Flow Graph parsing approach is quite new, it has not yet taken a place in
286
448
  standard parser textbooks. Here are a few references (and links) of papers on GFG:
287
449
  - K. Pingali, G. Bilardi. [Parsing with Pictures](http://apps.cs.utexas.edu/tech_reports/reports/tr/TR-2102.pdf)
288
450
  - K. Pingali, G. Bilardi. [A Graphical Model for Context-Free Grammar Parsing.](https://link.springer.com/chapter/10.1007/978-3-662-46663-6_1)
@@ -90,7 +90,21 @@ unless result.success?
90
90
  end
91
91
 
92
92
  ########################################
93
- # Step 6. Generating the parse forest
94
- pforest = result.parse_forest
93
+ # Step 6. Generating a parse tree from parse result
94
+ ptree = result.parse_tree
95
95
 
96
- puts "Parsing ambiguous? #{pforest.ambiguous?}" # => Parsing ambiguous? false
96
+ # Let's create a parse tree visitor
97
+ visitor = Rley::ParseTreeVisitor.new(ptree)
98
+
99
+ # Let's create a formatter (i.e. visit event listener)
100
+ # renderer = Rley::Formatter::Debug.new($stdout)
101
+
102
+ # Let's create a formatter that will render the parse tree with characters
103
+ renderer = Rley::Formatter::Asciitree.new($stdout)
104
+
105
+ # Let's create a formatter that will render the parse tree in labelled bracket notation
106
+ # renderer = Rley::Formatter::BracketNotation .new($stdout)
107
+
108
+ # Subscribe the formatter to the visitor's event and launch the visit
109
+ renderer.render(visitor)
110
+ # End of file
@@ -27,6 +27,13 @@ unless result.success?
27
27
  exit(1)
28
28
  end
29
29
 
30
- # Generate a parse forest from the parse result
31
- pforest = result.parse_forest
30
+ # Generate a parse tree from the parse result
31
+ ptree = result.parse_tree
32
+
33
+ # Let's create a parse tree visitor
34
+ visitor = Rley::ParseTreeVisitor.new(ptree)
35
+
36
+ # Now output formatted parse tree
37
+ renderer = Rley::Formatter::Asciitree.new($stdout)
38
+ renderer.render(visitor)
32
39
  # End of file
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Rley # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.4.04'.freeze
6
+ Version = '0.4.05'.freeze
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = "Ruby implementation of the Earley's parsing algorithm".freeze
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rley
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.04
4
+ version: 0.4.05
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-05-01 00:00:00.000000000 Z
11
+ date: 2017-05-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake