rley 0.4.04 → 0.4.05
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/README.md +172 -10
- data/examples/NLP/mini_en_demo.rb +17 -3
- data/examples/general/calc/calc_demo.rb +9 -2
- data/lib/rley/constants.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 89fbee176719b82e8e1b1676ceb89b837e9ad249
|
4
|
+
data.tar.gz: ff72f2224e98479729b441aa0a38b0da61fd34e4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a8eed1075570fac6f89f6ed16164c52026426dd0c497030fd31bbfd861e7873744be3acc14afa046da40f67ddf9d4987301dac76227ac4d0f504e40f46caa862
|
7
|
+
data.tar.gz: 684e37a2a236bc9e34e4bacba4775b61608be5dab58b3e4af414da222cf0c79f3db98631b904969c23b3ed882368ea553c7c25f9099cd09f17d5d5105111ba21
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
### 0.4.05 / 2017-05-06
|
2
|
+
* [CHANGE] File `README.md` Added documentation on how to build parse trees and manipulate them.
|
3
|
+
* [CHANGE] File `examples\NLP\mini_en_demo.rb` now emits different parse tree representations.
|
4
|
+
* [NEW] Directory `www`. Contains a diagram output produced from Rley and fed to online RSyntaxTree tool.
|
5
|
+
|
6
|
+
|
7
|
+
|
1
8
|
### 0.4.04 / 2017-05-01
|
2
9
|
* [NEW] `Asciitree` formatter class. Allows parse tree output in simple printable text.
|
3
10
|
* [CHANGE] Major enhancements in directory `examples\data_formats\JSON`. The demo command-line tool parses JSON and outputs the parse tree in one of the supported formats.
|
data/README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
[Rley](https://github.com/famished-tiger/Rley)
|
2
|
-
|
2
|
+
====
|
3
3
|
[](https://travis-ci.org/famished-tiger/Rley)
|
4
4
|
[](https://ci.appveyor.com/project/famished-tiger/rley)
|
5
5
|
[](https://coveralls.io/r/famished-tiger/Rley?branch=master)
|
@@ -10,9 +10,6 @@
|
|
10
10
|
|
11
11
|
A Ruby library for constructing general parsers for _any_ context-free language.
|
12
12
|
|
13
|
-
====
|
14
|
-
|
15
|
-
|
16
13
|
What is Rley?
|
17
14
|
-------------
|
18
15
|
__Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)
|
@@ -28,7 +25,7 @@ encode efficiently all the possible parse trees that result from an ambiguous
|
|
28
25
|
grammar.
|
29
26
|
|
30
27
|
As another distinctive mark, __Rley__ is also the first Ruby implementation of a
|
31
|
-
parsing library based on the new
|
28
|
+
parsing library based on the new [Grammar Flow Graph](#grammar-flow-graph) approach .
|
32
29
|
|
33
30
|
### What it can do?
|
34
31
|
Maybe parsing algorithms and internal implementation details are of lesser
|
@@ -81,7 +78,7 @@ The tour is organized as follows:
|
|
81
78
|
3. [Creating a tokenizer](#creating-a-tokenizer)
|
82
79
|
4. [Building the parser](#building-the-parser)
|
83
80
|
5. [Parsing some input](#parsing-some-input)
|
84
|
-
6. [Generating the parse
|
81
|
+
6. [Generating the parse tree](#generating-the-parse-tree)
|
85
82
|
|
86
83
|
The complete source code of the example used in this tour can be found in the
|
87
84
|
[examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
|
@@ -183,11 +180,173 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
|
|
183
180
|
puts "Parsing successful? #{result.success?}" # => Parsing successful? true
|
184
181
|
```
|
185
182
|
|
186
|
-
|
183
|
+
At this stage, we're done with parsing. What we need next are convenient means
|
184
|
+
to exploit the parse result. As it is, the `result` variable in the last code snippet
|
185
|
+
above is a data structure ("Earley item sets") that is tightly related to the intricate details
|
186
|
+
of the Earley's parsing algorithm. Obviously, it contains all the necessary data to exploit
|
187
|
+
the parsing results but it is too low-level and inconvenient from a programming viewpoint.
|
188
|
+
Therefore, __Rley__ provides out of the box two convenient data structures for
|
189
|
+
representing the parse outcome:
|
190
|
+
- Parse tree (optimal when the parse is unambiguous)
|
191
|
+
- Parse forest (a more sophisticated data structure that copes with ambiguity)
|
192
|
+
|
193
|
+
For our whirlwind tour, we will opt for parse trees.
|
194
|
+
|
195
|
+
## Generating the parse tree
|
196
|
+
|
197
|
+
```ruby
|
198
|
+
ptree = result.parse_tree
|
199
|
+
```
|
200
|
+
OK. Now that we have the parse tree, what we can do with it?
|
201
|
+
One option is to manipulate the parse tree and its node directly. For instance,
|
202
|
+
one could write code to customize and transform the parse tree. This approach gives
|
203
|
+
most the of flexibility needed for advanced applications. The other, more common
|
204
|
+
option is to rely on the parse tree visitor instantiated from the `Rley::ParseTreeVisitor` class.
|
205
|
+
Such a visitor walks over the parse tree nodes and generates visit events that
|
206
|
+
are dispatched to subscribed event listeners. All this may, at first, sound
|
207
|
+
complicate but the coming code snippets will show it otherwise.
|
208
|
+
|
209
|
+
Let's do it by:
|
210
|
+
- Creating a parse tree visitor
|
211
|
+
- Using one of the built-in visit subscribers specifically created to render the
|
212
|
+
parse tree in a given output format.
|
213
|
+
|
214
|
+
### Creating a parse tree visitor
|
215
|
+
Good news: creating a parse tree visitor for the parse tree `ptree` is just
|
216
|
+
an one-liner:
|
217
|
+
|
218
|
+
```ruby
|
219
|
+
# Let's create a parse tree visitor
|
220
|
+
visitor = Rley::ParseTreeVisitor.new(ptree)
|
221
|
+
```
|
222
|
+
|
223
|
+
### Visiting the parse tree
|
224
|
+
|
225
|
+
Unsurprisingly, to start the parse tree visit, one calls the `#start` method:
|
226
|
+
|
227
|
+
```ruby
|
228
|
+
visitor.start
|
229
|
+
```
|
230
|
+
|
231
|
+
If you try the above line, no particular result will be visible and for a good reason:
|
232
|
+
no object was specified as a visit event subscriber. As a convenience, __Rley__
|
233
|
+
bundles a number of [formatter classes](https://github.com/famished-tiger/Rley/tree/master/lib/rley/formatter)
|
234
|
+
that were designed to listen to the visit event and then render the parse tree
|
235
|
+
in a specific format. To begin with, we'll use the simple formatter
|
236
|
+
`Rley::Formatter::Debug` class. Its purpose is just to print out the visit event
|
237
|
+
name.
|
238
|
+
|
239
|
+
Remove the line with the call to the `#start` method and replace it with the two
|
240
|
+
statements:
|
241
|
+
```ruby
|
242
|
+
# Let's create a formatter (i.e. visit event listener)
|
243
|
+
renderer = Rley::Formatter::Debug.new($stdout)
|
244
|
+
|
245
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
246
|
+
renderer.render(visitor)
|
247
|
+
```
|
248
|
+
|
249
|
+
These two lines will generate the following output:
|
250
|
+
```
|
251
|
+
before_ptree
|
252
|
+
before_non_terminal
|
253
|
+
before_subnodes
|
254
|
+
before_non_terminal
|
255
|
+
before_subnodes
|
256
|
+
before_terminal
|
257
|
+
after_terminal
|
258
|
+
after_subnodes
|
259
|
+
after_non_terminal
|
260
|
+
before_non_terminal
|
261
|
+
before_subnodes
|
262
|
+
before_terminal
|
263
|
+
after_terminal
|
264
|
+
before_non_terminal
|
265
|
+
before_subnodes
|
266
|
+
before_terminal
|
267
|
+
after_terminal
|
268
|
+
after_subnodes
|
269
|
+
after_non_terminal
|
270
|
+
before_non_terminal
|
271
|
+
before_subnodes
|
272
|
+
before_terminal
|
273
|
+
after_terminal
|
274
|
+
before_non_terminal
|
275
|
+
before_subnodes
|
276
|
+
before_terminal
|
277
|
+
after_terminal
|
278
|
+
before_terminal
|
279
|
+
after_terminal
|
280
|
+
after_subnodes
|
281
|
+
after_non_terminal
|
282
|
+
after_subnodes
|
283
|
+
after_non_terminal
|
284
|
+
after_subnodes
|
285
|
+
after_non_terminal
|
286
|
+
after_subnodes
|
287
|
+
after_non_terminal
|
288
|
+
after_ptree
|
289
|
+
```
|
290
|
+
|
291
|
+
At least is something visible: these are the parse tree visit events.
|
292
|
+
Note that the indentation of event names depends on the nesting level of
|
293
|
+
the tree node being visited.
|
294
|
+
|
295
|
+
Not really impressive? So let's use another formatter...
|
296
|
+
|
297
|
+
### Visualizing the parse tree structure
|
298
|
+
If one replaces the previous formatter by an instance of
|
299
|
+
`Rley::Formatter::Asciitree` the output now shows the parse tree structure.
|
300
|
+
|
187
301
|
```ruby
|
188
|
-
|
302
|
+
# Let's create a formatter that will render the parse tree with characters
|
303
|
+
renderer = Rley::Formatter::Asciitree.new($stdout)
|
304
|
+
|
305
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
306
|
+
renderer.render(visitor)
|
307
|
+
```
|
308
|
+
|
309
|
+
The outputs looks like this:
|
310
|
+
```
|
311
|
+
S
|
312
|
+
+-- NP
|
313
|
+
| +-- Proper-Noun: 'John'
|
314
|
+
+-- VP
|
315
|
+
+-- Verb: 'saw'
|
316
|
+
+-- NP
|
317
|
+
| +-- Proper-Noun: 'Mary'
|
318
|
+
+-- PP
|
319
|
+
+-- Preposition: 'with'
|
320
|
+
+-- NP
|
321
|
+
+-- Determiner: 'a'
|
322
|
+
+-- Noun: 'telescope'
|
189
323
|
```
|
190
324
|
|
325
|
+
If you are more inclined for graphical representation, then replace the last formatter
|
326
|
+
by yet another one:
|
327
|
+
|
328
|
+
```ruby
|
329
|
+
# Let's create a formatter that will render the parse tree in labelled bracket notation
|
330
|
+
renderer = Rley::Formatter::BracketNotation .new($stdout)
|
331
|
+
|
332
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
333
|
+
renderer.render(visitor)
|
334
|
+
```
|
335
|
+
|
336
|
+
This results in the strange-looking output:
|
337
|
+
```
|
338
|
+
[S [NP [Proper-Noun John]][VP [Verb saw][NP [Proper-Noun Mary]][PP [Preposition with][NP [Determiner a][Noun telescope]]]]]
|
339
|
+
```
|
340
|
+
|
341
|
+
This output is in a format that is recognized by many NLP software.
|
342
|
+
The next diagram was created by copy-pasting the output above in the online tool
|
343
|
+
[RSyntaxTree](http://yohasebe.com/rsyntaxtree/).
|
344
|
+
By the way, this tool is also a Ruby gem, [rsyntaxtree](https://rubygems.org/gems/rsyntaxtree).
|
345
|
+
|
346
|
+

|
347
|
+
|
348
|
+
|
349
|
+
|
191
350
|
## Error reporting
|
192
351
|
__Rley__ is a non-violent parser, that is, it won't throw an exception when it
|
193
352
|
detects a syntax error. Instead, the parse result will be marked as
|
@@ -277,12 +436,15 @@ Here are a few other ones:
|
|
277
436
|
conforms to the grammar rules or not. As such it cannot build parse trees.
|
278
437
|
The code doesn't seem to be maintained: latest commit dates from Oct. 2011.
|
279
438
|
|
439
|
+
## Other interesting Ruby resources
|
440
|
+
The extensive resource list not to miss: [Awesome NLP with Ruby](https://github.com/arbox/nlp-with-ruby)
|
441
|
+
actively curated by Andrei Beliankou (aka arbox).
|
280
442
|
|
281
443
|
## Thanks to:
|
282
444
|
* Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchanges.
|
283
445
|
|
284
|
-
##
|
285
|
-
Since the
|
446
|
+
## Grammar Flow Graph
|
447
|
+
Since the Grammar Flow Graph parsing approach is quite new, it has not yet taken a place in
|
286
448
|
standard parser textbooks. Here are a few references (and links) of papers on GFG:
|
287
449
|
- K. Pingali, G. Bilardi. [Parsing with Pictures](http://apps.cs.utexas.edu/tech_reports/reports/tr/TR-2102.pdf)
|
288
450
|
- K. Pingali, G. Bilardi. [A Graphical Model for Context-Free Grammar Parsing.](https://link.springer.com/chapter/10.1007/978-3-662-46663-6_1)
|
@@ -90,7 +90,21 @@ unless result.success?
|
|
90
90
|
end
|
91
91
|
|
92
92
|
########################################
|
93
|
-
# Step 6. Generating
|
94
|
-
|
93
|
+
# Step 6. Generating a parse tree from parse result
|
94
|
+
ptree = result.parse_tree
|
95
95
|
|
96
|
-
|
96
|
+
# Let's create a parse tree visitor
|
97
|
+
visitor = Rley::ParseTreeVisitor.new(ptree)
|
98
|
+
|
99
|
+
# Let's create a formatter (i.e. visit event listener)
|
100
|
+
# renderer = Rley::Formatter::Debug.new($stdout)
|
101
|
+
|
102
|
+
# Let's create a formatter that will render the parse tree with characters
|
103
|
+
renderer = Rley::Formatter::Asciitree.new($stdout)
|
104
|
+
|
105
|
+
# Let's create a formatter that will render the parse tree in labelled bracket notation
|
106
|
+
# renderer = Rley::Formatter::BracketNotation .new($stdout)
|
107
|
+
|
108
|
+
# Subscribe the formatter to the visitor's event and launch the visit
|
109
|
+
renderer.render(visitor)
|
110
|
+
# End of file
|
@@ -27,6 +27,13 @@ unless result.success?
|
|
27
27
|
exit(1)
|
28
28
|
end
|
29
29
|
|
30
|
-
# Generate a parse
|
31
|
-
|
30
|
+
# Generate a parse tree from the parse result
|
31
|
+
ptree = result.parse_tree
|
32
|
+
|
33
|
+
# Let's create a parse tree visitor
|
34
|
+
visitor = Rley::ParseTreeVisitor.new(ptree)
|
35
|
+
|
36
|
+
# Now output formatted parse tree
|
37
|
+
renderer = Rley::Formatter::Asciitree.new($stdout)
|
38
|
+
renderer.render(visitor)
|
32
39
|
# End of file
|
data/lib/rley/constants.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rley
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.4.
|
4
|
+
version: 0.4.05
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-05-
|
11
|
+
date: 2017-05-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|