sequitur 0.1.01 → 0.1.02
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +8 -8
- data/CHANGELOG.md +5 -1
- data/README.md +84 -2
- data/lib/sequitur/constants.rb +1 -1
- data/lib/sequitur/formatter/base_formatter.rb +9 -7
- data/lib/sequitur/grammar_visitor.rb +2 -1
- data/spec/sequitur/digram_spec.rb +0 -1
- data/spec/sequitur/dynamic_grammar_spec.rb +1 -1
- data/spec/sequitur/formatter/base_text_spec.rb +5 -5
- data/spec/sequitur/formatter/debug_spec.rb +4 -4
- data/spec/sequitur/grammar_visitor_spec.rb +1 -1
- data/spec/sequitur/production_ref_spec.rb +2 -2
- metadata +7 -3
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
YTdkNzZiNTc1NjBkM2M0MDlhZDI1M2MyNTFhODJhZGI1MjFlYWI2MQ==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
MmEzZGRlNTI2M2U3ZmQwYmY3MTA1MmE0MDkzMGQ0ZjBmZDJlYTRmMQ==
|
7
7
|
!binary "U0hBNTEy":
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
MjUzMmU0YTQ4MzQ2NmVmMWU2YWQzMTkwZDNiZjM3MjgyOTFlMmRmZDJmMmJi
|
10
|
+
NmM5YjMxMjA0YzM5OGFiOGRiYjBmYTc2M2YyN2NiNjJiMGRlYjJkMmMxMThk
|
11
|
+
ZGU1MDlhYTBkZDc3YTEwMDAwNmQ0YTZlOTQyZGM5YmFmNTRjNmM=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
NGU2NjY2Yzc2ZmQ4NDFlN2E4MGVlYTUwMDg4NjgwYzBiYjk0ZjM5NGY4MTg4
|
14
|
+
NTI5NTExOTQzMWY1YzhiNWM4ZjM1OWQ5YjM1MjViZWVlYWRlMWU5NjcyNDNk
|
15
|
+
MzAwMzZhM2NlZGE1M2MzYTYyOGZmODkyMWE4YjA0NTE3MTk4NjA=
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,10 @@
|
|
1
|
+
### 0.1.02 / 2014-09-18
|
2
|
+
* [CHANGE] File `README.md`: expanded introductory text.
|
3
|
+
* [CHANGE] File `sequitur.gemspec` : expanded gem description in the specification.
|
4
|
+
|
1
5
|
### 0.1.01 / 2014-09-17
|
2
6
|
* [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
|
3
|
-
* [CHANGE] File `README.
|
7
|
+
* [CHANGE] File `README.md`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
|
4
8
|
* [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
|
5
9
|
* [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
|
6
10
|
* [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
|
data/README.md
CHANGED
@@ -15,7 +15,7 @@ The following are good entry points to learn about the algorithm:
|
|
15
15
|
|
16
16
|
### The theory in a nutshell ###
|
17
17
|
Given a sequence of input tokens (say, characters), the Sequitur algorithm
|
18
|
-
will represent that input sequence as a set of rules. As the algorithm detects
|
18
|
+
will represent that input sequence as a set of rules. As the algorithm detects
|
19
19
|
automatically repeated token patterns, the resulting rule set can encode repetitions in the input
|
20
20
|
in a very compact way.
|
21
21
|
Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
|
@@ -46,7 +46,7 @@ P3 : P2 d.
|
|
46
46
|
```
|
47
47
|
|
48
48
|
Translated in plain English:
|
49
|
-
- Rule (start) tells that the input consists of the sequence of
|
49
|
+
- Rule (start) tells that the input consists of the sequence of P1 P2 P3 patterns followed by the letter e.
|
50
50
|
- Rule (P1) represents the sequence 'ab'.
|
51
51
|
- Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
|
52
52
|
In other words, it represents the string 'abc'.
|
@@ -78,6 +78,7 @@ The following Ruby snippet show how to apply Sequitur on the input string from t
|
|
78
78
|
The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
|
79
79
|
can you make good use of the algorithm's result.
|
80
80
|
|
81
|
+
**Printing the resulting rules**
|
81
82
|
The very first natural step is to be able to print out the (grammar) rules.
|
82
83
|
Here's how:
|
83
84
|
|
@@ -106,6 +107,87 @@ Here's how:
|
|
106
107
|
# P3 : P2 d.
|
107
108
|
```
|
108
109
|
|
110
|
+
## Understanding the algorithm's results
|
111
|
+
The Sequitur algorithm generates a -simplified- context-free grammar, therefore we dedicate this section
|
112
|
+
to the terminology about context-free grammars. As the Internet provides tons of information can be found
|
113
|
+
on the subject, we limit ourselves to the minimal terminology of interest when using the sequitur gem.
|
114
|
+
|
115
|
+
First of all, what is a **grammar**? To simplify the matter, one can see a grammar as a set of
|
116
|
+
grammar rules. These rules are called production rules or more briefly **productions**.
|
117
|
+
|
118
|
+
In a context-free grammar, productions have the form:
|
119
|
+
````
|
120
|
+
P : body.
|
121
|
+
```
|
122
|
+
|
123
|
+
Where:
|
124
|
+
- The colon ':' character separates the head (= left-hand side) and the body (right-hand side, *rhs* in short)
|
125
|
+
of the rule.
|
126
|
+
- The left-hand side consists just of one symbol, P. P is a categorized as a *nonterminal symbol* and for our purposes
|
127
|
+
a nonterminal symbol can be seen as the "name" of the production. By contrast, a terminal symbol is just one element
|
128
|
+
from the input sequence (symbols as defined in formal grammar theory shouldn't be confused with Ruby's `Symbol` class).
|
129
|
+
- the body is a sequence -possibly empty- of *symbols* (terminal or nonterminal).
|
130
|
+
|
131
|
+
Basically, a production rule tells that P is equivalent to the sequence of symbols found in the
|
132
|
+
right-hand side of the production. A nonterminal symbol that appears in the rhs of a production can be
|
133
|
+
seen as a reference to the production with same name.
|
134
|
+
|
135
|
+
|
136
|
+
## The Sequitur API
|
137
|
+
|
138
|
+
Recall the above example: a single call to the `Sequitur#build_from` factory method
|
139
|
+
suffices to construct a grammar object.
|
140
|
+
|
141
|
+
```ruby
|
142
|
+
require 'sequitur'
|
143
|
+
|
144
|
+
input_sequence = 'ababcabcdabcde'
|
145
|
+
grammar = Sequitur.build_from(input_sequence)
|
146
|
+
```
|
147
|
+
|
148
|
+
The return value `grammar` is a `Sequitur::SequiturGrammar` instance.
|
149
|
+
|
150
|
+
Unsurprisingly, the `Sequitur::SequiturGrammar` class defines an accessor method called 'productions'
|
151
|
+
that returns the productions of the grammar as an array of `Sequitur::Production` objects.
|
152
|
+
|
153
|
+
```ruby
|
154
|
+
# Count the number of productions in the grammar
|
155
|
+
puts grammar.productions.size # => 4
|
156
|
+
|
157
|
+
# Retrieve all productions of the grammar
|
158
|
+
all_prods = grammar.productions
|
159
|
+
|
160
|
+
# Retrieve the start production
|
161
|
+
start_prod = grammar.production[0]
|
162
|
+
```
|
163
|
+
|
164
|
+
Once we have a grip on a production, it is easy to access its right-hand side through the `Production#rhs` method.
|
165
|
+
It returns an array of symbols.
|
166
|
+
|
167
|
+
```ruby
|
168
|
+
# ...Continuing the same example
|
169
|
+
# Retrieve the right-hand side of the production
|
170
|
+
prod_body = start_prod.rhs # Return an Array object
|
171
|
+
```
|
172
|
+
|
173
|
+
The RHS of a production is a sequence (i.e. Array) of symbols.
|
174
|
+
How are the grammar symbols implemented?
|
175
|
+
-Terminal symbols are directly originating from the input sequence. They are inserted "as is" in the
|
176
|
+
RHS. For instance, if the input sequence consists of integer values (i.e. Finum instances), then they
|
177
|
+
will be inserted in the RHS of productions.
|
178
|
+
-Non-terminal symbols are implemented as `Sequitur::ProductionRef` objects.
|
179
|
+
|
180
|
+
A ProductionRef is reference to a Production object. The latter one can be accessed through the `ProductionRef#production` method.
|
181
|
+
|
182
|
+
|
183
|
+
### Installation ###
|
184
|
+
The sequitur gem installation is fairly standard.
|
185
|
+
If your project has a `Gemfile` file, add `sequitur` to it. Otherwise, install the gem like this:
|
186
|
+
|
187
|
+
```bash
|
188
|
+
$[sudo] gem install sequitur
|
189
|
+
```
|
190
|
+
|
109
191
|
|
110
192
|
|
111
193
|
### TODO: Add more documentation ###
|
data/lib/sequitur/constants.rb
CHANGED
@@ -17,17 +17,19 @@ module Sequitur
|
|
17
17
|
# Given a grammar or a grammar visitor, perform the visit
|
18
18
|
# and render the visit events in the output stream.
|
19
19
|
def render(aGrmOrVisitor)
|
20
|
-
|
21
|
-
aGrmOrVisitor
|
20
|
+
if aGrmOrVisitor.kind_of?(GrammarVisitor)
|
21
|
+
a_visitor = aGrmOrVisitor
|
22
22
|
else
|
23
|
-
aGrmOrVisitor.visitor
|
23
|
+
a_visitor = aGrmOrVisitor.visitor
|
24
24
|
end
|
25
25
|
|
26
|
-
|
27
|
-
|
28
|
-
|
26
|
+
a_visitor.subscribe(self)
|
27
|
+
a_visitor.start
|
28
|
+
a_visitor.unsubscribe(self)
|
29
29
|
end
|
30
30
|
|
31
31
|
end # class
|
32
32
|
end # module
|
33
|
-
end # module
|
33
|
+
end # module
|
34
|
+
|
35
|
+
# End of file
|
@@ -22,7 +22,7 @@ class GrammarVisitor
|
|
22
22
|
end
|
23
23
|
|
24
24
|
def unsubscribe(aSubscriber)
|
25
|
-
subscribers.delete_if { |entry| entry == aSubscriber}
|
25
|
+
subscribers.delete_if { |entry| entry == aSubscriber }
|
26
26
|
end
|
27
27
|
|
28
28
|
# The signal to start the visit.
|
@@ -66,6 +66,7 @@ class GrammarVisitor
|
|
66
66
|
end
|
67
67
|
|
68
68
|
private
|
69
|
+
|
69
70
|
def broadcast(msg, *args)
|
70
71
|
subscribers.each do |a_subscriber|
|
71
72
|
next unless a_subscriber.respond_to?(msg)
|
@@ -117,7 +117,7 @@ describe DynamicGrammar do
|
|
117
117
|
a_visitor.subscribe(fake_formatter)
|
118
118
|
|
119
119
|
expect(fake_formatter).to receive(:before_grammar).with(subject).ordered
|
120
|
-
expect(fake_formatter).to receive(:before_production).with(subject.root)
|
120
|
+
expect(fake_formatter).to receive(:before_production).with(subject.root)
|
121
121
|
expect(fake_formatter).to receive(:before_rhs).with([]).ordered
|
122
122
|
expect(fake_formatter).to receive(:after_rhs).with([]).ordered
|
123
123
|
expect(fake_formatter).to receive(:after_production).with(subject.root)
|
@@ -41,7 +41,7 @@ describe BaseText do
|
|
41
41
|
expect { BaseText.new(StringIO.new('', 'w')) }.not_to raise_error
|
42
42
|
end
|
43
43
|
|
44
|
-
it
|
44
|
+
it 'should know its output destination' do
|
45
45
|
instance = BaseText.new(destination)
|
46
46
|
expect(instance.output).to eq(destination)
|
47
47
|
end
|
@@ -54,7 +54,7 @@ describe BaseText do
|
|
54
54
|
instance = BaseText.new(destination)
|
55
55
|
a_visitor = empty_grammar.visitor
|
56
56
|
instance.render(a_visitor)
|
57
|
-
expectations
|
57
|
+
expectations = <<-SNIPPET
|
58
58
|
start :.
|
59
59
|
SNIPPET
|
60
60
|
expect(destination.string).to eq(expectations)
|
@@ -64,7 +64,7 @@ SNIPPET
|
|
64
64
|
instance = BaseText.new(destination)
|
65
65
|
a_visitor = sample_grammar.visitor # Use visitor explicitly
|
66
66
|
instance.render(a_visitor)
|
67
|
-
expectations
|
67
|
+
expectations = <<-SNIPPET
|
68
68
|
start :.
|
69
69
|
P1 : a.
|
70
70
|
P2 : b.
|
@@ -77,7 +77,7 @@ SNIPPET
|
|
77
77
|
it 'should support visit events without an explicit visitor' do
|
78
78
|
instance = BaseText.new(destination)
|
79
79
|
instance.render(sample_grammar)
|
80
|
-
expectations
|
80
|
+
expectations = <<-SNIPPET
|
81
81
|
start :.
|
82
82
|
P1 : a.
|
83
83
|
P2 : b.
|
@@ -92,4 +92,4 @@ end # describe
|
|
92
92
|
end # module
|
93
93
|
end # module
|
94
94
|
|
95
|
-
# End of file
|
95
|
+
# End of file
|
@@ -41,7 +41,7 @@ describe Debug do
|
|
41
41
|
expect { Debug.new(StringIO.new('', 'w')) }.not_to raise_error
|
42
42
|
end
|
43
43
|
|
44
|
-
it
|
44
|
+
it 'should know its output destination' do
|
45
45
|
instance = Debug.new(destination)
|
46
46
|
expect(instance.output).to eq(destination)
|
47
47
|
end
|
@@ -54,7 +54,7 @@ describe Debug do
|
|
54
54
|
instance = Debug.new(destination)
|
55
55
|
a_visitor = empty_grammar.visitor
|
56
56
|
instance.render(a_visitor)
|
57
|
-
expectations
|
57
|
+
expectations = <<-SNIPPET
|
58
58
|
before_grammar
|
59
59
|
before_production
|
60
60
|
before_rhs
|
@@ -69,7 +69,7 @@ SNIPPET
|
|
69
69
|
instance = Debug.new(destination)
|
70
70
|
a_visitor = sample_grammar.visitor
|
71
71
|
instance.render(a_visitor)
|
72
|
-
expectations
|
72
|
+
expectations = <<-SNIPPET
|
73
73
|
before_grammar
|
74
74
|
before_production
|
75
75
|
before_rhs
|
@@ -111,4 +111,4 @@ end # describe
|
|
111
111
|
end # module
|
112
112
|
end # module
|
113
113
|
|
114
|
-
# End of file
|
114
|
+
# End of file
|
@@ -72,8 +72,8 @@ describe ProductionRef do
|
|
72
72
|
|
73
73
|
it 'should complain when binding to something else than production' do
|
74
74
|
subject.bind_to(target)
|
75
|
-
msg =
|
76
|
-
expect {subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
|
75
|
+
msg = 'Illegal production type String'
|
76
|
+
expect { subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
|
77
77
|
end
|
78
78
|
|
79
79
|
it 'should compare to other production (reference)' do
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequitur
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.02
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-09-
|
11
|
+
date: 2014-09-18 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -66,7 +66,11 @@ dependencies:
|
|
66
66
|
- - ! '>='
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: 2.0.0
|
69
|
-
description: Ruby implementation of the Sequitur algorithm.
|
69
|
+
description: ! "Ruby implementation of the Sequitur algorithm. This algorithm automatically
|
70
|
+
\nfinds repetitions and hierarchical structures in a given sequence of input \ntokens.
|
71
|
+
It encodes the input into a context-free grammar. \nThe Sequitur algorithm can be
|
72
|
+
used to \na) compress a sequence of items,\nb) discover patterns in an sequence,
|
73
|
+
\nc) generate grammar rules that can represent a given input.\n"
|
70
74
|
email: famished.tiger@yahoo.com
|
71
75
|
executables: []
|
72
76
|
extensions: []
|