rley 0.4.04 → 0.4.05

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5100723b47c2572c2ac0b5996413eb4cd8e462ba
4
- data.tar.gz: 4d37be9ca7128a36f77505560c7214454e413eb5
3
+ metadata.gz: 89fbee176719b82e8e1b1676ceb89b837e9ad249
4
+ data.tar.gz: ff72f2224e98479729b441aa0a38b0da61fd34e4
5
5
  SHA512:
6
- metadata.gz: c197b1b962408e3a3ce4be5d310c227bdc638028d9a40d26b89c544232f59a8ce994228e4ae3673cbe6147dbcd520ee53c36a0620e3c7871edd3aba46762933f
7
- data.tar.gz: 21b59e0b9b5c942be83954b35ee15650e1bc2c2aa2f9a6f1ebba645b94e3f822df510026825354512cf7eea879431feed852f34cef9fd176c5d0b89377504a46
6
+ metadata.gz: a8eed1075570fac6f89f6ed16164c52026426dd0c497030fd31bbfd861e7873744be3acc14afa046da40f67ddf9d4987301dac76227ac4d0f504e40f46caa862
7
+ data.tar.gz: 684e37a2a236bc9e34e4bacba4775b61608be5dab58b3e4af414da222cf0c79f3db98631b904969c23b3ed882368ea553c7c25f9099cd09f17d5d5105111ba21
data/CHANGELOG.md CHANGED
@@ -1,3 +1,10 @@
1
+ ### 0.4.05 / 2017-05-06
2
+ * [CHANGE] File `README.md` Added documentation on how to build parse trees and manipulate them.
3
+ * [CHANGE] File `examples\NLP\mini_en_demo.rb` now emits different parse tree representations.
4
+ * [NEW] Directory `www`. Contains a diagram output produced from Rley and fed to online RSyntaxTree tool.
5
+
6
+
7
+
1
8
  ### 0.4.04 / 2017-05-01
2
9
  * [NEW] `Asciitree` formatter class. Allows parse tree output in simple printable text.
3
10
  * [CHANGE] Major enhancements in directory `examples\data_formats\JSON`. The demo command-line tool parses JSON and outputs the parse tree in one of the supported formats.
data/README.md CHANGED
@@ -1,5 +1,5 @@
1
1
  [Rley](https://github.com/famished-tiger/Rley)
2
-
2
+ ====
3
3
  [![Linux Build Status](https://img.shields.io/travis/famished-tiger/Rley/master.svg?label=Linux%20build)](https://travis-ci.org/famished-tiger/Rley)
4
4
  [![Build status](https://ci.appveyor.com/api/projects/status/l5adgcbfo128rvo9?svg=true)](https://ci.appveyor.com/project/famished-tiger/rley)
5
5
  [![Coverage Status](https://img.shields.io/coveralls/famished-tiger/Rley.svg)](https://coveralls.io/r/famished-tiger/Rley?branch=master)
@@ -10,9 +10,6 @@
10
10
 
11
11
  A Ruby library for constructing general parsers for _any_ context-free language.
12
12
 
13
- ====
14
-
15
-
16
13
  What is Rley?
17
14
  -------------
18
15
  __Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)
@@ -28,7 +25,7 @@ encode efficiently all the possible parse trees that result from an ambiguous
28
25
  grammar.
29
26
 
30
27
  As another distinctive mark, __Rley__ is also the first Ruby implementation of a
31
- parsing library based on the new _Grammar Flow Graph_ approach [References on GFG](#references-on-gfg).
28
+ parsing library based on the new [Grammar Flow Graph](#grammar-flow-graph) approach .
32
29
 
33
30
  ### What it can do?
34
31
  Maybe parsing algorithms and internal implementation details are of lesser
@@ -81,7 +78,7 @@ The tour is organized as follows:
81
78
  3. [Creating a tokenizer](#creating-a-tokenizer)
82
79
  4. [Building the parser](#building-the-parser)
83
80
  5. [Parsing some input](#parsing-some-input)
84
- 6. [Generating the parse forest](#generating-the-parse-forest)
81
+ 6. [Generating the parse tree](#generating-the-parse-tree)
85
82
 
86
83
  The complete source code of the example used in this tour can be found in the
87
84
  [examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
@@ -183,11 +180,173 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
183
180
  puts "Parsing successful? #{result.success?}" # => Parsing successful? true
184
181
  ```
185
182
 
186
- ## Generating the parse forest
183
+ At this stage, we're done with parsing. What we need next are convenient means
184
+ to exploit the parse result. As it is, the `result` variable in the last code snippet
185
+ above is a data structure ("Earley item sets") that is tightly related to the intricate details
186
+ of the Earley's parsing algorithm. Obviously, it contains all the necessary data to exploit
187
+ the parsing results but it is too low-level and inconvenient from a programming viewpoint.
188
+ Therefore, __Rley__ provides out of the box two convenient data structures for
189
+ representing the parse outcome:
190
+ - Parse tree (optimal when the parse is unambiguous)
191
+ - Parse forest (a more sophisticated data structure that copes with ambiguity)
192
+
193
+ For our whirlwind tour, we will opt for parse trees.
194
+
195
+ ## Generating the parse tree
196
+
197
+ ```ruby
198
+ ptree = result.parse_tree
199
+ ```
200
+ OK. Now that we have the parse tree, what we can do with it?
201
+ One option is to manipulate the parse tree and its node directly. For instance,
202
+ one could write code to customize and transform the parse tree. This approach gives
203
+ most the of flexibility needed for advanced applications. The other, more common
204
+ option is to rely on the parse tree visitor instantiated from the `Rley::ParseTreeVisitor` class.
205
+ Such a visitor walks over the parse tree nodes and generates visit events that
206
+ are dispatched to subscribed event listeners. All this may, at first, sound
207
+ complicate but the coming code snippets will show it otherwise.
208
+
209
+ Let's do it by:
210
+ - Creating a parse tree visitor
211
+ - Using one of the built-in visit subscribers specifically created to render the
212
+ parse tree in a given output format.
213
+
214
+ ### Creating a parse tree visitor
215
+ Good news: creating a parse tree visitor for the parse tree `ptree` is just
216
+ an one-liner:
217
+
218
+ ```ruby
219
+ # Let's create a parse tree visitor
220
+ visitor = Rley::ParseTreeVisitor.new(ptree)
221
+ ```
222
+
223
+ ### Visiting the parse tree
224
+
225
+ Unsurprisingly, to start the parse tree visit, one calls the `#start` method:
226
+
227
+ ```ruby
228
+ visitor.start
229
+ ```
230
+
231
+ If you try the above line, no particular result will be visible and for a good reason:
232
+ no object was specified as a visit event subscriber. As a convenience, __Rley__
233
+ bundles a number of [formatter classes](https://github.com/famished-tiger/Rley/tree/master/lib/rley/formatter)
234
+ that were designed to listen to the visit event and then render the parse tree
235
+ in a specific format. To begin with, we'll use the simple formatter
236
+ `Rley::Formatter::Debug` class. Its purpose is just to print out the visit event
237
+ name.
238
+
239
+ Remove the line with the call to the `#start` method and replace it with the two
240
+ statements:
241
+ ```ruby
242
+ # Let's create a formatter (i.e. visit event listener)
243
+ renderer = Rley::Formatter::Debug.new($stdout)
244
+
245
+ # Subscribe the formatter to the visitor's event and launch the visit
246
+ renderer.render(visitor)
247
+ ```
248
+
249
+ These two lines will generate the following output:
250
+ ```
251
+ before_ptree
252
+ before_non_terminal
253
+ before_subnodes
254
+ before_non_terminal
255
+ before_subnodes
256
+ before_terminal
257
+ after_terminal
258
+ after_subnodes
259
+ after_non_terminal
260
+ before_non_terminal
261
+ before_subnodes
262
+ before_terminal
263
+ after_terminal
264
+ before_non_terminal
265
+ before_subnodes
266
+ before_terminal
267
+ after_terminal
268
+ after_subnodes
269
+ after_non_terminal
270
+ before_non_terminal
271
+ before_subnodes
272
+ before_terminal
273
+ after_terminal
274
+ before_non_terminal
275
+ before_subnodes
276
+ before_terminal
277
+ after_terminal
278
+ before_terminal
279
+ after_terminal
280
+ after_subnodes
281
+ after_non_terminal
282
+ after_subnodes
283
+ after_non_terminal
284
+ after_subnodes
285
+ after_non_terminal
286
+ after_subnodes
287
+ after_non_terminal
288
+ after_ptree
289
+ ```
290
+
291
+ At least is something visible: these are the parse tree visit events.
292
+ Note that the indentation of event names depends on the nesting level of
293
+ the tree node being visited.
294
+
295
+ Not really impressive? So let's use another formatter...
296
+
297
+ ### Visualizing the parse tree structure
298
+ If one replaces the previous formatter by an instance of
299
+ `Rley::Formatter::Asciitree` the output now shows the parse tree structure.
300
+
187
301
  ```ruby
188
- pforest = result.parse_forest
302
+ # Let's create a formatter that will render the parse tree with characters
303
+ renderer = Rley::Formatter::Asciitree.new($stdout)
304
+
305
+ # Subscribe the formatter to the visitor's event and launch the visit
306
+ renderer.render(visitor)
307
+ ```
308
+
309
+ The outputs looks like this:
310
+ ```
311
+ S
312
+ +-- NP
313
+ | +-- Proper-Noun: 'John'
314
+ +-- VP
315
+ +-- Verb: 'saw'
316
+ +-- NP
317
+ | +-- Proper-Noun: 'Mary'
318
+ +-- PP
319
+ +-- Preposition: 'with'
320
+ +-- NP
321
+ +-- Determiner: 'a'
322
+ +-- Noun: 'telescope'
189
323
  ```
190
324
 
325
+ If you are more inclined for graphical representation, then replace the last formatter
326
+ by yet another one:
327
+
328
+ ```ruby
329
+ # Let's create a formatter that will render the parse tree in labelled bracket notation
330
+ renderer = Rley::Formatter::BracketNotation .new($stdout)
331
+
332
+ # Subscribe the formatter to the visitor's event and launch the visit
333
+ renderer.render(visitor)
334
+ ```
335
+
336
+ This results in the strange-looking output:
337
+ ```
338
+ [S [NP [Proper-Noun John]][VP [Verb saw][NP [Proper-Noun Mary]][PP [Preposition with][NP [Determiner a][Noun telescope]]]]]
339
+ ```
340
+
341
+ This output is in a format that is recognized by many NLP software.
342
+ The next diagram was created by copy-pasting the output above in the online tool
343
+ [RSyntaxTree](http://yohasebe.com/rsyntaxtree/).
344
+ By the way, this tool is also a Ruby gem, [rsyntaxtree](https://rubygems.org/gems/rsyntaxtree).
345
+
346
+ ![Sample parse tree diagram](www/sample_parse_tree.png)
347
+
348
+
349
+
191
350
  ## Error reporting
192
351
  __Rley__ is a non-violent parser, that is, it won't throw an exception when it
193
352
  detects a syntax error. Instead, the parse result will be marked as
@@ -277,12 +436,15 @@ Here are a few other ones:
277
436
  conforms to the grammar rules or not. As such it cannot build parse trees.
278
437
  The code doesn't seem to be maintained: latest commit dates from Oct. 2011.
279
438
 
439
+ ## Other interesting Ruby resources
440
+ The extensive resource list not to miss: [Awesome NLP with Ruby](https://github.com/arbox/nlp-with-ruby)
441
+ actively curated by Andrei Beliankou (aka arbox).
280
442
 
281
443
  ## Thanks to:
282
444
  * Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchanges.
283
445
 
284
- ## References on GFG
285
- Since the __G__rammar __F__low __G__raph parsing approach is quite new, it has not yet taken a place in
446
+ ## Grammar Flow Graph
447
+ Since the Grammar Flow Graph parsing approach is quite new, it has not yet taken a place in
286
448
  standard parser textbooks. Here are a few references (and links) of papers on GFG:
287
449
  - K. Pingali, G. Bilardi. [Parsing with Pictures](http://apps.cs.utexas.edu/tech_reports/reports/tr/TR-2102.pdf)
288
450
  - K. Pingali, G. Bilardi. [A Graphical Model for Context-Free Grammar Parsing.](https://link.springer.com/chapter/10.1007/978-3-662-46663-6_1)
@@ -90,7 +90,21 @@ unless result.success?
90
90
  end
91
91
 
92
92
  ########################################
93
- # Step 6. Generating the parse forest
94
- pforest = result.parse_forest
93
+ # Step 6. Generating a parse tree from parse result
94
+ ptree = result.parse_tree
95
95
 
96
- puts "Parsing ambiguous? #{pforest.ambiguous?}" # => Parsing ambiguous? false
96
+ # Let's create a parse tree visitor
97
+ visitor = Rley::ParseTreeVisitor.new(ptree)
98
+
99
+ # Let's create a formatter (i.e. visit event listener)
100
+ # renderer = Rley::Formatter::Debug.new($stdout)
101
+
102
+ # Let's create a formatter that will render the parse tree with characters
103
+ renderer = Rley::Formatter::Asciitree.new($stdout)
104
+
105
+ # Let's create a formatter that will render the parse tree in labelled bracket notation
106
+ # renderer = Rley::Formatter::BracketNotation .new($stdout)
107
+
108
+ # Subscribe the formatter to the visitor's event and launch the visit
109
+ renderer.render(visitor)
110
+ # End of file
@@ -27,6 +27,13 @@ unless result.success?
27
27
  exit(1)
28
28
  end
29
29
 
30
- # Generate a parse forest from the parse result
31
- pforest = result.parse_forest
30
+ # Generate a parse tree from the parse result
31
+ ptree = result.parse_tree
32
+
33
+ # Let's create a parse tree visitor
34
+ visitor = Rley::ParseTreeVisitor.new(ptree)
35
+
36
+ # Now output formatted parse tree
37
+ renderer = Rley::Formatter::Asciitree.new($stdout)
38
+ renderer.render(visitor)
32
39
  # End of file
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Rley # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.4.04'.freeze
6
+ Version = '0.4.05'.freeze
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = "Ruby implementation of the Earley's parsing algorithm".freeze
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rley
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.04
4
+ version: 0.4.05
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-05-01 00:00:00.000000000 Z
11
+ date: 2017-05-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake