sequitur 0.1.01 → 0.1.02
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +8 -8
- data/CHANGELOG.md +5 -1
- data/README.md +84 -2
- data/lib/sequitur/constants.rb +1 -1
- data/lib/sequitur/formatter/base_formatter.rb +9 -7
- data/lib/sequitur/grammar_visitor.rb +2 -1
- data/spec/sequitur/digram_spec.rb +0 -1
- data/spec/sequitur/dynamic_grammar_spec.rb +1 -1
- data/spec/sequitur/formatter/base_text_spec.rb +5 -5
- data/spec/sequitur/formatter/debug_spec.rb +4 -4
- data/spec/sequitur/grammar_visitor_spec.rb +1 -1
- data/spec/sequitur/production_ref_spec.rb +2 -2
- metadata +7 -3
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
YTdkNzZiNTc1NjBkM2M0MDlhZDI1M2MyNTFhODJhZGI1MjFlYWI2MQ==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
MmEzZGRlNTI2M2U3ZmQwYmY3MTA1MmE0MDkzMGQ0ZjBmZDJlYTRmMQ==
|
7
7
|
!binary "U0hBNTEy":
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
MjUzMmU0YTQ4MzQ2NmVmMWU2YWQzMTkwZDNiZjM3MjgyOTFlMmRmZDJmMmJi
|
10
|
+
NmM5YjMxMjA0YzM5OGFiOGRiYjBmYTc2M2YyN2NiNjJiMGRlYjJkMmMxMThk
|
11
|
+
ZGU1MDlhYTBkZDc3YTEwMDAwNmQ0YTZlOTQyZGM5YmFmNTRjNmM=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
NGU2NjY2Yzc2ZmQ4NDFlN2E4MGVlYTUwMDg4NjgwYzBiYjk0ZjM5NGY4MTg4
|
14
|
+
NTI5NTExOTQzMWY1YzhiNWM4ZjM1OWQ5YjM1MjViZWVlYWRlMWU5NjcyNDNk
|
15
|
+
MzAwMzZhM2NlZGE1M2MzYTYyOGZmODkyMWE4YjA0NTE3MTk4NjA=
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,10 @@
|
|
1
|
+
### 0.1.02 / 2014-09-18
|
2
|
+
* [CHANGE] File `README.md`: expanded introductory text.
|
3
|
+
* [CHANGE] File `sequitur.gemspec` : expanded gem description in the specification.
|
4
|
+
|
1
5
|
### 0.1.01 / 2014-09-17
|
2
6
|
* [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
|
3
|
-
* [CHANGE] File `README.
|
7
|
+
* [CHANGE] File `README.md`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
|
4
8
|
* [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
|
5
9
|
* [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
|
6
10
|
* [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
|
data/README.md
CHANGED
@@ -15,7 +15,7 @@ The following are good entry points to learn about the algorithm:
|
|
15
15
|
|
16
16
|
### The theory in a nutshell ###
|
17
17
|
Given a sequence of input tokens (say, characters), the Sequitur algorithm
|
18
|
-
will represent that input sequence as a set of rules. As the algorithm detects
|
18
|
+
will represent that input sequence as a set of rules. As the algorithm detects
|
19
19
|
automatically repeated token patterns, the resulting rule set can encode repetitions in the input
|
20
20
|
in a very compact way.
|
21
21
|
Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
|
@@ -46,7 +46,7 @@ P3 : P2 d.
|
|
46
46
|
```
|
47
47
|
|
48
48
|
Translated in plain English:
|
49
|
-
- Rule (start) tells that the input consists of the sequence of
|
49
|
+
- Rule (start) tells that the input consists of the sequence of P1 P2 P3 patterns followed by the letter e.
|
50
50
|
- Rule (P1) represents the sequence 'ab'.
|
51
51
|
- Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
|
52
52
|
In other words, it represents the string 'abc'.
|
@@ -78,6 +78,7 @@ The following Ruby snippet show how to apply Sequitur on the input string from t
|
|
78
78
|
The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
|
79
79
|
can you make good use of the algorithm's result.
|
80
80
|
|
81
|
+
**Printing the resulting rules**
|
81
82
|
The very first natural step is to be able to print out the (grammar) rules.
|
82
83
|
Here's how:
|
83
84
|
|
@@ -106,6 +107,87 @@ Here's how:
|
|
106
107
|
# P3 : P2 d.
|
107
108
|
```
|
108
109
|
|
110
|
+
## Understanding the algorithm's results
|
111
|
+
The Sequitur algorithm generates a -simplified- context-free grammar, therefore we dedicate this section
|
112
|
+
to the terminology about context-free grammars. As the Internet provides tons of information can be found
|
113
|
+
on the subject, we limit ourselves to the minimal terminology of interest when using the sequitur gem.
|
114
|
+
|
115
|
+
First of all, what is a **grammar**? To simplify the matter, one can see a grammar as a set of
|
116
|
+
grammar rules. These rules are called production rules or more briefly **productions**.
|
117
|
+
|
118
|
+
In a context-free grammar, productions have the form:
|
119
|
+
````
|
120
|
+
P : body.
|
121
|
+
```
|
122
|
+
|
123
|
+
Where:
|
124
|
+
- The colon ':' character separates the head (= left-hand side) and the body (right-hand side, *rhs* in short)
|
125
|
+
of the rule.
|
126
|
+
- The left-hand side consists just of one symbol, P. P is a categorized as a *nonterminal symbol* and for our purposes
|
127
|
+
a nonterminal symbol can be seen as the "name" of the production. By contrast, a terminal symbol is just one element
|
128
|
+
from the input sequence (symbols as defined in formal grammar theory shouldn't be confused with Ruby's `Symbol` class).
|
129
|
+
- the body is a sequence -possibly empty- of *symbols* (terminal or nonterminal).
|
130
|
+
|
131
|
+
Basically, a production rule tells that P is equivalent to the sequence of symbols found in the
|
132
|
+
right-hand side of the production. A nonterminal symbol that appears in the rhs of a production can be
|
133
|
+
seen as a reference to the production with same name.
|
134
|
+
|
135
|
+
|
136
|
+
## The Sequitur API
|
137
|
+
|
138
|
+
Recall the above example: a single call to the `Sequitur#build_from` factory method
|
139
|
+
suffices to construct a grammar object.
|
140
|
+
|
141
|
+
```ruby
|
142
|
+
require 'sequitur'
|
143
|
+
|
144
|
+
input_sequence = 'ababcabcdabcde'
|
145
|
+
grammar = Sequitur.build_from(input_sequence)
|
146
|
+
```
|
147
|
+
|
148
|
+
The return value `grammar` is a `Sequitur::SequiturGrammar` instance.
|
149
|
+
|
150
|
+
Unsurprisingly, the `Sequitur::SequiturGrammar` class defines an accessor method called 'productions'
|
151
|
+
that returns the productions of the grammar as an array of `Sequitur::Production` objects.
|
152
|
+
|
153
|
+
```ruby
|
154
|
+
# Count the number of productions in the grammar
|
155
|
+
puts grammar.productions.size # => 4
|
156
|
+
|
157
|
+
# Retrieve all productions of the grammar
|
158
|
+
all_prods = grammar.productions
|
159
|
+
|
160
|
+
# Retrieve the start production
|
161
|
+
start_prod = grammar.production[0]
|
162
|
+
```
|
163
|
+
|
164
|
+
Once we have a grip on a production, it is easy to access its right-hand side through the `Production#rhs` method.
|
165
|
+
It returns an array of symbols.
|
166
|
+
|
167
|
+
```ruby
|
168
|
+
# ...Continuing the same example
|
169
|
+
# Retrieve the right-hand side of the production
|
170
|
+
prod_body = start_prod.rhs # Return an Array object
|
171
|
+
```
|
172
|
+
|
173
|
+
The RHS of a production is a sequence (i.e. Array) of symbols.
|
174
|
+
How are the grammar symbols implemented?
|
175
|
+
-Terminal symbols are directly originating from the input sequence. They are inserted "as is" in the
|
176
|
+
RHS. For instance, if the input sequence consists of integer values (i.e. Finum instances), then they
|
177
|
+
will be inserted in the RHS of productions.
|
178
|
+
-Non-terminal symbols are implemented as `Sequitur::ProductionRef` objects.
|
179
|
+
|
180
|
+
A ProductionRef is reference to a Production object. The latter one can be accessed through the `ProductionRef#production` method.
|
181
|
+
|
182
|
+
|
183
|
+
### Installation ###
|
184
|
+
The sequitur gem installation is fairly standard.
|
185
|
+
If your project has a `Gemfile` file, add `sequitur` to it. Otherwise, install the gem like this:
|
186
|
+
|
187
|
+
```bash
|
188
|
+
$[sudo] gem install sequitur
|
189
|
+
```
|
190
|
+
|
109
191
|
|
110
192
|
|
111
193
|
### TODO: Add more documentation ###
|
data/lib/sequitur/constants.rb
CHANGED
@@ -17,17 +17,19 @@ module Sequitur
|
|
17
17
|
# Given a grammar or a grammar visitor, perform the visit
|
18
18
|
# and render the visit events in the output stream.
|
19
19
|
def render(aGrmOrVisitor)
|
20
|
-
|
21
|
-
aGrmOrVisitor
|
20
|
+
if aGrmOrVisitor.kind_of?(GrammarVisitor)
|
21
|
+
a_visitor = aGrmOrVisitor
|
22
22
|
else
|
23
|
-
aGrmOrVisitor.visitor
|
23
|
+
a_visitor = aGrmOrVisitor.visitor
|
24
24
|
end
|
25
25
|
|
26
|
-
|
27
|
-
|
28
|
-
|
26
|
+
a_visitor.subscribe(self)
|
27
|
+
a_visitor.start
|
28
|
+
a_visitor.unsubscribe(self)
|
29
29
|
end
|
30
30
|
|
31
31
|
end # class
|
32
32
|
end # module
|
33
|
-
end # module
|
33
|
+
end # module
|
34
|
+
|
35
|
+
# End of file
|
@@ -22,7 +22,7 @@ class GrammarVisitor
|
|
22
22
|
end
|
23
23
|
|
24
24
|
def unsubscribe(aSubscriber)
|
25
|
-
subscribers.delete_if { |entry| entry == aSubscriber}
|
25
|
+
subscribers.delete_if { |entry| entry == aSubscriber }
|
26
26
|
end
|
27
27
|
|
28
28
|
# The signal to start the visit.
|
@@ -66,6 +66,7 @@ class GrammarVisitor
|
|
66
66
|
end
|
67
67
|
|
68
68
|
private
|
69
|
+
|
69
70
|
def broadcast(msg, *args)
|
70
71
|
subscribers.each do |a_subscriber|
|
71
72
|
next unless a_subscriber.respond_to?(msg)
|
@@ -117,7 +117,7 @@ describe DynamicGrammar do
|
|
117
117
|
a_visitor.subscribe(fake_formatter)
|
118
118
|
|
119
119
|
expect(fake_formatter).to receive(:before_grammar).with(subject).ordered
|
120
|
-
expect(fake_formatter).to receive(:before_production).with(subject.root)
|
120
|
+
expect(fake_formatter).to receive(:before_production).with(subject.root)
|
121
121
|
expect(fake_formatter).to receive(:before_rhs).with([]).ordered
|
122
122
|
expect(fake_formatter).to receive(:after_rhs).with([]).ordered
|
123
123
|
expect(fake_formatter).to receive(:after_production).with(subject.root)
|
@@ -41,7 +41,7 @@ describe BaseText do
|
|
41
41
|
expect { BaseText.new(StringIO.new('', 'w')) }.not_to raise_error
|
42
42
|
end
|
43
43
|
|
44
|
-
it
|
44
|
+
it 'should know its output destination' do
|
45
45
|
instance = BaseText.new(destination)
|
46
46
|
expect(instance.output).to eq(destination)
|
47
47
|
end
|
@@ -54,7 +54,7 @@ describe BaseText do
|
|
54
54
|
instance = BaseText.new(destination)
|
55
55
|
a_visitor = empty_grammar.visitor
|
56
56
|
instance.render(a_visitor)
|
57
|
-
expectations
|
57
|
+
expectations = <<-SNIPPET
|
58
58
|
start :.
|
59
59
|
SNIPPET
|
60
60
|
expect(destination.string).to eq(expectations)
|
@@ -64,7 +64,7 @@ SNIPPET
|
|
64
64
|
instance = BaseText.new(destination)
|
65
65
|
a_visitor = sample_grammar.visitor # Use visitor explicitly
|
66
66
|
instance.render(a_visitor)
|
67
|
-
expectations
|
67
|
+
expectations = <<-SNIPPET
|
68
68
|
start :.
|
69
69
|
P1 : a.
|
70
70
|
P2 : b.
|
@@ -77,7 +77,7 @@ SNIPPET
|
|
77
77
|
it 'should support visit events without an explicit visitor' do
|
78
78
|
instance = BaseText.new(destination)
|
79
79
|
instance.render(sample_grammar)
|
80
|
-
expectations
|
80
|
+
expectations = <<-SNIPPET
|
81
81
|
start :.
|
82
82
|
P1 : a.
|
83
83
|
P2 : b.
|
@@ -92,4 +92,4 @@ end # describe
|
|
92
92
|
end # module
|
93
93
|
end # module
|
94
94
|
|
95
|
-
# End of file
|
95
|
+
# End of file
|
@@ -41,7 +41,7 @@ describe Debug do
|
|
41
41
|
expect { Debug.new(StringIO.new('', 'w')) }.not_to raise_error
|
42
42
|
end
|
43
43
|
|
44
|
-
it
|
44
|
+
it 'should know its output destination' do
|
45
45
|
instance = Debug.new(destination)
|
46
46
|
expect(instance.output).to eq(destination)
|
47
47
|
end
|
@@ -54,7 +54,7 @@ describe Debug do
|
|
54
54
|
instance = Debug.new(destination)
|
55
55
|
a_visitor = empty_grammar.visitor
|
56
56
|
instance.render(a_visitor)
|
57
|
-
expectations
|
57
|
+
expectations = <<-SNIPPET
|
58
58
|
before_grammar
|
59
59
|
before_production
|
60
60
|
before_rhs
|
@@ -69,7 +69,7 @@ SNIPPET
|
|
69
69
|
instance = Debug.new(destination)
|
70
70
|
a_visitor = sample_grammar.visitor
|
71
71
|
instance.render(a_visitor)
|
72
|
-
expectations
|
72
|
+
expectations = <<-SNIPPET
|
73
73
|
before_grammar
|
74
74
|
before_production
|
75
75
|
before_rhs
|
@@ -111,4 +111,4 @@ end # describe
|
|
111
111
|
end # module
|
112
112
|
end # module
|
113
113
|
|
114
|
-
# End of file
|
114
|
+
# End of file
|
@@ -72,8 +72,8 @@ describe ProductionRef do
|
|
72
72
|
|
73
73
|
it 'should complain when binding to something else than production' do
|
74
74
|
subject.bind_to(target)
|
75
|
-
msg =
|
76
|
-
expect {subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
|
75
|
+
msg = 'Illegal production type String'
|
76
|
+
expect { subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
|
77
77
|
end
|
78
78
|
|
79
79
|
it 'should compare to other production (reference)' do
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequitur
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.02
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-09-
|
11
|
+
date: 2014-09-18 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -66,7 +66,11 @@ dependencies:
|
|
66
66
|
- - ! '>='
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: 2.0.0
|
69
|
-
description: Ruby implementation of the Sequitur algorithm.
|
69
|
+
description: ! "Ruby implementation of the Sequitur algorithm. This algorithm automatically
|
70
|
+
\nfinds repetitions and hierarchical structures in a given sequence of input \ntokens.
|
71
|
+
It encodes the input into a context-free grammar. \nThe Sequitur algorithm can be
|
72
|
+
used to \na) compress a sequence of items,\nb) discover patterns in an sequence,
|
73
|
+
\nc) generate grammar rules that can represent a given input.\n"
|
70
74
|
email: famished.tiger@yahoo.com
|
71
75
|
executables: []
|
72
76
|
extensions: []
|