rley 0.4.04 → 0.4.05
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/README.md +172 -10
- data/examples/NLP/mini_en_demo.rb +17 -3
- data/examples/general/calc/calc_demo.rb +9 -2
- data/lib/rley/constants.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 89fbee176719b82e8e1b1676ceb89b837e9ad249
|
4
|
+
data.tar.gz: ff72f2224e98479729b441aa0a38b0da61fd34e4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a8eed1075570fac6f89f6ed16164c52026426dd0c497030fd31bbfd861e7873744be3acc14afa046da40f67ddf9d4987301dac76227ac4d0f504e40f46caa862
|
7
|
+
data.tar.gz: 684e37a2a236bc9e34e4bacba4775b61608be5dab58b3e4af414da222cf0c79f3db98631b904969c23b3ed882368ea553c7c25f9099cd09f17d5d5105111ba21
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
### 0.4.05 / 2017-05-06
|
2
|
+
* [CHANGE] File `README.md` Added documentation on how to build parse trees and manipulate them.
|
3
|
+
* [CHANGE] File `examples\NLP\mini_en_demo.rb` now emits different parse tree representations.
|
4
|
+
* [NEW] Directory `www`. Contains a diagram output produced from Rley and fed to online RSyntaxTree tool.
|
5
|
+
|
6
|
+
|
7
|
+
|
1
8
|
### 0.4.04 / 2017-05-01
|
2
9
|
* [NEW] `Asciitree` formatter class. Allows parse tree output in simple printable text.
|
3
10
|
* [CHANGE] Major enhancements in directory `examples\data_formats\JSON`. The demo command-line tool parses JSON and outputs the parse tree in one of the supported formats.
|
data/README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
[Rley](https://github.com/famished-tiger/Rley)
|
2
|
-
|
2
|
+
====
|
3
3
|
[![Linux Build Status](https://img.shields.io/travis/famished-tiger/Rley/master.svg?label=Linux%20build)](https://travis-ci.org/famished-tiger/Rley)
|
4
4
|
[![Build status](https://ci.appveyor.com/api/projects/status/l5adgcbfo128rvo9?svg=true)](https://ci.appveyor.com/project/famished-tiger/rley)
|
5
5
|
[![Coverage Status](https://img.shields.io/coveralls/famished-tiger/Rley.svg)](https://coveralls.io/r/famished-tiger/Rley?branch=master)
|
@@ -10,9 +10,6 @@
|
|
10
10
|
|
11
11
|
A Ruby library for constructing general parsers for _any_ context-free language.
|
12
12
|
|
13
|
-
====
|
14
|
-
|
15
|
-
|
16
13
|
What is Rley?
|
17
14
|
-------------
|
18
15
|
__Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)
|
@@ -28,7 +25,7 @@ encode efficiently all the possible parse trees that result from an ambiguous
|
|
28
25
|
grammar.
|
29
26
|
|
30
27
|
As another distinctive mark, __Rley__ is also the first Ruby implementation of a
|
31
|
-
parsing library based on the new
|
28
|
+
parsing library based on the new [Grammar Flow Graph](#grammar-flow-graph) approach .
|
32
29
|
|
33
30
|
### What it can do?
|
34
31
|
Maybe parsing algorithms and internal implementation details are of lesser
|
@@ -81,7 +78,7 @@ The tour is organized as follows:
|
|
81
78
|
3. [Creating a tokenizer](#creating-a-tokenizer)
|
82
79
|
4. [Building the parser](#building-the-parser)
|
83
80
|
5. [Parsing some input](#parsing-some-input)
|
84
|
-
6. [Generating the parse
|
81
|
+
6. [Generating the parse tree](#generating-the-parse-tree)
|
85
82
|
|
86
83
|
The complete source code of the example used in this tour can be found in the
|
87
84
|
[examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
|
@@ -183,11 +180,173 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
|
|
183
180
|
puts "Parsing successful? #{result.success?}" # => Parsing successful? true
|
184
181
|
```
|
185
182
|
|
186
|
-
|
183
|
+
At this stage, we're done with parsing. What we need next are convenient means
|
184
|
+
to exploit the parse result. As it is, the `result` variable in the last code snippet
|
185
|
+
above is a data structure ("Earley item sets") that is tightly related to the intricate details
|
186
|
+
of the Earley's parsing algorithm. Obviously, it contains all the necessary data to exploit
|
187
|
+
the parsing results but it is too low-level and inconvenient from a programming viewpoint.
|
188
|
+
Therefore, __Rley__ provides out of the box two convenient data structures for
|
189
|
+
representing the parse outcome:
|
190
|
+
- Parse tree (optimal when the parse is unambiguous)
|
191
|
+
- Parse forest (a more sophisticated data structure that copes with ambiguity)
|
192
|
+
|
193
|
+
For our whirlwind tour, we will opt for parse trees.
|
194
|
+
|
195
|
+
## Generating the parse tree
|
196
|
+
|
197
|
+
```ruby
|
198
|
+
ptree = result.parse_tree
|
199
|
+
```
|
200
|
+
OK. Now that we have the parse tree, what we can do with it?
|
201
|
+
One option is to manipulate the parse tree and its node directly. For instance,
|
202
|
+
one could write code to customize and transform the parse tree. This approach gives
|
203
|
+
most the of flexibility needed for advanced applications. The other, more common
|
204
|
+
option is to rely on the parse tree visitor instantiated from the `Rley::ParseTreeVisitor` class.
|
205
|
+
Such a visitor walks over the parse tree nodes and generates visit events that
|
206
|
+
are dispatched to subscribed event listeners. All this may, at first, sound
|
207
|
+
complicate but the coming code snippets will show it otherwise.
|
208
|
+
|
209
|
+
Let's do it by:
|
210
|
+
- Creating a parse tree visitor
|
211
|
+
- Using one of the built-in visit subscribers specifically created to render the
|
212
|
+
parse tree in a given output format.
|
213
|
+
|
214
|
+
### Creating a parse tree visitor
|
215
|
+
Good news: creating a parse tree visitor for the parse tree `ptree` is just
|
216
|
+
an one-liner:
|
217
|
+
|
218
|
+
```ruby
|
219
|
+
# Let's create a parse tree visitor
|
220
|
+
visitor = Rley::ParseTreeVisitor.new(ptree)
|
221
|
+
```
|
222
|
+
|
223
|
+
### Visiting the parse tree
|
224
|
+
|
225
|
+
Unsurprisingly, to start the parse tree visit, one calls the `#start` method:
|
226
|
+
|
227
|
+
```ruby
|
228
|
+
visitor.start
|
229
|
+
```
|
230
|
+
|
231
|
+
If you try the above line, no particular result will be visible and for a good reason:
|
232
|
+
no object was specified as a visit event subscriber. As a convenience, __Rley__
|
233
|
+
bundles a number of [formatter classes](https://github.com/famished-tiger/Rley/tree/master/lib/rley/formatter)
|
234
|
+
that were designed to listen to the visit event and then render the parse tree
|
235
|
+
in a specific format. To begin with, we'll use the simple formatter
|
236
|
+
`Rley::Formatter::Debug` class. Its purpose is just to print out the visit event
|
237
|
+
name.
|
238
|
+
|
239
|
+
Remove the line with the call to the `#start` method and replace it with the two
|
240
|
+
statements:
|
241
|
+
```ruby
|
242
|
+
# Let's create a formatter (i.e. visit event listener)
|
243
|
+
renderer = Rley::Formatter::Debug.new($stdout)
|
244
|
+
|
245
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
246
|
+
renderer.render(visitor)
|
247
|
+
```
|
248
|
+
|
249
|
+
These two lines will generate the following output:
|
250
|
+
```
|
251
|
+
before_ptree
|
252
|
+
before_non_terminal
|
253
|
+
before_subnodes
|
254
|
+
before_non_terminal
|
255
|
+
before_subnodes
|
256
|
+
before_terminal
|
257
|
+
after_terminal
|
258
|
+
after_subnodes
|
259
|
+
after_non_terminal
|
260
|
+
before_non_terminal
|
261
|
+
before_subnodes
|
262
|
+
before_terminal
|
263
|
+
after_terminal
|
264
|
+
before_non_terminal
|
265
|
+
before_subnodes
|
266
|
+
before_terminal
|
267
|
+
after_terminal
|
268
|
+
after_subnodes
|
269
|
+
after_non_terminal
|
270
|
+
before_non_terminal
|
271
|
+
before_subnodes
|
272
|
+
before_terminal
|
273
|
+
after_terminal
|
274
|
+
before_non_terminal
|
275
|
+
before_subnodes
|
276
|
+
before_terminal
|
277
|
+
after_terminal
|
278
|
+
before_terminal
|
279
|
+
after_terminal
|
280
|
+
after_subnodes
|
281
|
+
after_non_terminal
|
282
|
+
after_subnodes
|
283
|
+
after_non_terminal
|
284
|
+
after_subnodes
|
285
|
+
after_non_terminal
|
286
|
+
after_subnodes
|
287
|
+
after_non_terminal
|
288
|
+
after_ptree
|
289
|
+
```
|
290
|
+
|
291
|
+
At least is something visible: these are the parse tree visit events.
|
292
|
+
Note that the indentation of event names depends on the nesting level of
|
293
|
+
the tree node being visited.
|
294
|
+
|
295
|
+
Not really impressive? So let's use another formatter...
|
296
|
+
|
297
|
+
### Visualizing the parse tree structure
|
298
|
+
If one replaces the previous formatter by an instance of
|
299
|
+
`Rley::Formatter::Asciitree` the output now shows the parse tree structure.
|
300
|
+
|
187
301
|
```ruby
|
188
|
-
|
302
|
+
# Let's create a formatter that will render the parse tree with characters
|
303
|
+
renderer = Rley::Formatter::Asciitree.new($stdout)
|
304
|
+
|
305
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
306
|
+
renderer.render(visitor)
|
307
|
+
```
|
308
|
+
|
309
|
+
The outputs looks like this:
|
310
|
+
```
|
311
|
+
S
|
312
|
+
+-- NP
|
313
|
+
| +-- Proper-Noun: 'John'
|
314
|
+
+-- VP
|
315
|
+
+-- Verb: 'saw'
|
316
|
+
+-- NP
|
317
|
+
| +-- Proper-Noun: 'Mary'
|
318
|
+
+-- PP
|
319
|
+
+-- Preposition: 'with'
|
320
|
+
+-- NP
|
321
|
+
+-- Determiner: 'a'
|
322
|
+
+-- Noun: 'telescope'
|
189
323
|
```
|
190
324
|
|
325
|
+
If you are more inclined for graphical representation, then replace the last formatter
|
326
|
+
by yet another one:
|
327
|
+
|
328
|
+
```ruby
|
329
|
+
# Let's create a formatter that will render the parse tree in labelled bracket notation
|
330
|
+
renderer = Rley::Formatter::BracketNotation .new($stdout)
|
331
|
+
|
332
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
333
|
+
renderer.render(visitor)
|
334
|
+
```
|
335
|
+
|
336
|
+
This results in the strange-looking output:
|
337
|
+
```
|
338
|
+
[S [NP [Proper-Noun John]][VP [Verb saw][NP [Proper-Noun Mary]][PP [Preposition with][NP [Determiner a][Noun telescope]]]]]
|
339
|
+
```
|
340
|
+
|
341
|
+
This output is in a format that is recognized by many NLP software.
|
342
|
+
The next diagram was created by copy-pasting the output above in the online tool
|
343
|
+
[RSyntaxTree](http://yohasebe.com/rsyntaxtree/).
|
344
|
+
By the way, this tool is also a Ruby gem, [rsyntaxtree](https://rubygems.org/gems/rsyntaxtree).
|
345
|
+
|
346
|
+
![Sample parse tree diagram](www/sample_parse_tree.png)
|
347
|
+
|
348
|
+
|
349
|
+
|
191
350
|
## Error reporting
|
192
351
|
__Rley__ is a non-violent parser, that is, it won't throw an exception when it
|
193
352
|
detects a syntax error. Instead, the parse result will be marked as
|
@@ -277,12 +436,15 @@ Here are a few other ones:
|
|
277
436
|
conforms to the grammar rules or not. As such it cannot build parse trees.
|
278
437
|
The code doesn't seem to be maintained: latest commit dates from Oct. 2011.
|
279
438
|
|
439
|
+
## Other interesting Ruby resources
|
440
|
+
The extensive resource list not to miss: [Awesome NLP with Ruby](https://github.com/arbox/nlp-with-ruby)
|
441
|
+
actively curated by Andrei Beliankou (aka arbox).
|
280
442
|
|
281
443
|
## Thanks to:
|
282
444
|
* Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchanges.
|
283
445
|
|
284
|
-
##
|
285
|
-
Since the
|
446
|
+
## Grammar Flow Graph
|
447
|
+
Since the Grammar Flow Graph parsing approach is quite new, it has not yet taken a place in
|
286
448
|
standard parser textbooks. Here are a few references (and links) of papers on GFG:
|
287
449
|
- K. Pingali, G. Bilardi. [Parsing with Pictures](http://apps.cs.utexas.edu/tech_reports/reports/tr/TR-2102.pdf)
|
288
450
|
- K. Pingali, G. Bilardi. [A Graphical Model for Context-Free Grammar Parsing.](https://link.springer.com/chapter/10.1007/978-3-662-46663-6_1)
|
@@ -90,7 +90,21 @@ unless result.success?
|
|
90
90
|
end
|
91
91
|
|
92
92
|
########################################
|
93
|
-
# Step 6. Generating
|
94
|
-
|
93
|
+
# Step 6. Generating a parse tree from parse result
|
94
|
+
ptree = result.parse_tree
|
95
95
|
|
96
|
-
|
96
|
+
# Let's create a parse tree visitor
|
97
|
+
visitor = Rley::ParseTreeVisitor.new(ptree)
|
98
|
+
|
99
|
+
# Let's create a formatter (i.e. visit event listener)
|
100
|
+
# renderer = Rley::Formatter::Debug.new($stdout)
|
101
|
+
|
102
|
+
# Let's create a formatter that will render the parse tree with characters
|
103
|
+
renderer = Rley::Formatter::Asciitree.new($stdout)
|
104
|
+
|
105
|
+
# Let's create a formatter that will render the parse tree in labelled bracket notation
|
106
|
+
# renderer = Rley::Formatter::BracketNotation .new($stdout)
|
107
|
+
|
108
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
109
|
+
renderer.render(visitor)
|
110
|
+
# End of file
|
@@ -27,6 +27,13 @@ unless result.success?
|
|
27
27
|
exit(1)
|
28
28
|
end
|
29
29
|
|
30
|
-
# Generate a parse
|
31
|
-
|
30
|
+
# Generate a parse tree from the parse result
|
31
|
+
ptree = result.parse_tree
|
32
|
+
|
33
|
+
# Let's create a parse tree visitor
|
34
|
+
visitor = Rley::ParseTreeVisitor.new(ptree)
|
35
|
+
|
36
|
+
# Now output formatted parse tree
|
37
|
+
renderer = Rley::Formatter::Asciitree.new($stdout)
|
38
|
+
renderer.render(visitor)
|
32
39
|
# End of file
|
data/lib/rley/constants.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rley
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.4.
|
4
|
+
version: 0.4.05
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-05-
|
11
|
+
date: 2017-05-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|