sequitur 0.1.00 → 0.1.01
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +8 -8
- data/CHANGELOG.md +8 -0
- data/README.md +73 -11
- data/lib/sequitur/constants.rb +1 -1
- data/lib/sequitur/formatter/base_formatter.rb +33 -0
- data/lib/sequitur/formatter/base_text.rb +5 -10
- data/lib/sequitur/formatter/debug.rb +5 -9
- data/spec/sequitur/formatter/base_text_spec.rb +19 -6
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
NTQ5ODkzODJlYjNmZDBiODNiZTdiZTE4ZDFlNTljYWFhMTg5MzExYw==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
YzZkNWE1MTdhZTBiMWZmOTI1ZDhhMDJkM2QxYTU3ZDAxZDExZjk5MQ==
|
7
7
|
!binary "U0hBNTEy":
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
YWVmZTE0YWQ4MmEyMTI4YTFjYmU2MTZkYTZhOGUwZTM4YTZmMDQ2OTliMDky
|
10
|
+
NjVmMjdkYmVkMGY3MGEzMDBlM2RjMzNlODMzZDQ3NGY4NzRmNzNkNjNlOTI2
|
11
|
+
ZGFlMTkxMTNmZjA0MTI2MjM0MzhkMjM4MjUwMzg0ZDk5NDU0YTU=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
MTNhOGQyOWNlNTQ4Y2RiOTcyYjU3ZDM3MGZmZGEzYjY2NjFkM2RiZmI3ZTZl
|
14
|
+
ZmRiM2JiNzU0MGVkMDUzYTVjM2M4MmM3MWQwYTgyNmQzYjA0MTdhNWI4Mjg2
|
15
|
+
ODAwYTc4M2JjZGQyOTUzOGM4MjA5MzZhYThlMmM5MWNjYjJhZmQ=
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,11 @@
|
|
1
|
+
### 0.1.01 / 2014-09-17
|
2
|
+
* [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
|
3
|
+
* [CHANGE] File `README.me`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
|
4
|
+
* [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
|
5
|
+
* [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
|
6
|
+
* [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
|
7
|
+
|
8
|
+
|
1
9
|
### 0.1.00 / 2014-09-16
|
2
10
|
* [CHANGE] Version number bumped. Added grammar rendering through specialized formatters.
|
3
11
|
|
data/README.md
CHANGED
@@ -9,43 +9,105 @@ _Ruby gem implementing the Sequitur algorithm_
|
|
9
9
|
|
10
10
|
|
11
11
|
### What is the Sequitur algorithm? ###
|
12
|
-
|
12
|
+
The following are good entry points to learn about the algorithm:
|
13
|
+
[Sequitur algorithm home](http://sequitur.info/)
|
13
14
|
[Wikipedia](http://en.wikipedia.org/wiki/Sequitur_algorithm)
|
14
15
|
|
15
|
-
|
16
|
-
|
16
|
+
### The theory in a nutshell ###
|
17
|
+
Given a sequence of input tokens (say, characters), the Sequitur algorithm
|
18
|
+
will represent that input sequence as a set of rules. As the algorithm detects
|
19
|
+
automatically repeated token patterns, the resulting rule set can encode repetitions in the input
|
20
|
+
in a very compact way.
|
21
|
+
Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
|
22
|
+
|
23
|
+
**Can you give a simple example?**
|
24
|
+
Sure. Let's begin with a very basic case. Assume that 'abcabcabc' is our input string.
|
25
|
+
Notice that it is the same as the text 'abc' repeated three times. The Sequitur algorithm captures
|
26
|
+
this repetition and will generate the two following rules:
|
27
|
+
|
28
|
+
```
|
29
|
+
start : P1 P1 P1.
|
30
|
+
P1 : a b c.
|
31
|
+
```
|
32
|
+
|
33
|
+
In plain English:
|
34
|
+
-The first rule (named start) always represents the whole input. Here, it indicates that the input
|
35
|
+
is three time the pattern encoded by the rule called P1.
|
36
|
+
-The second rule (named P1) represents the sequence a b c.
|
37
|
+
|
38
|
+
**Can you give another example?**
|
39
|
+
Yep. Assume this time that the input is 'ababcabcdabcde'.
|
40
|
+
Then Sequitur algorithm will generate the rule set:
|
41
|
+
```
|
42
|
+
start : P1 P2 P3 P3 e.
|
43
|
+
P1 : a b.
|
44
|
+
P2 : P1 c.
|
45
|
+
P3 : P2 d.
|
46
|
+
```
|
47
|
+
|
48
|
+
Translated in plain English:
|
49
|
+
- Rule (start) tells that the input consists of the sequence of P_1 P_2 P_3 patterns followed by the letter e.
|
50
|
+
- Rule (P1) represents the sequence 'ab'.
|
51
|
+
- Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
|
52
|
+
In other words, it represents the string 'abc'.
|
53
|
+
- Rule (P3) represents the pattern encoded by P2 then d. It is thus equivalent to 'abcd'.
|
54
|
+
|
55
|
+
**What is it used for?**
|
56
|
+
Sequitur can be used:
|
57
|
+
- As a lossless data compression algorithm (especially for structured text containing
|
58
|
+
repeated elements)
|
59
|
+
- To detect hierarchical structure in sequences (e.g. traces in program execution)
|
17
60
|
|
18
61
|
|
19
62
|
## Synopsis
|
20
63
|
|
64
|
+
**Time for a quick demo**
|
65
|
+
|
66
|
+
The following Ruby snippet show how to apply Sequitur on the input string from the last example above.
|
67
|
+
|
21
68
|
```ruby
|
22
69
|
|
23
70
|
require 'sequitur' # Load the Sequitur library
|
24
71
|
|
25
|
-
input_sequence = '
|
72
|
+
input_sequence = 'ababcabcdabcde' # Let's analyze this string
|
26
73
|
|
27
|
-
#
|
28
|
-
|
74
|
+
# Run the Sequitur algorithm which will result in a grammar (=rule set)
|
75
|
+
grammar = Sequitur.build_from(input_sequence)
|
76
|
+
````
|
77
|
+
|
78
|
+
The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
|
79
|
+
can you make good use of the algorithm's result.
|
80
|
+
|
81
|
+
The very first natural step is to be able to print out the (grammar) rules.
|
82
|
+
Here's how:
|
83
|
+
|
84
|
+
|
85
|
+
```ruby
|
86
|
+
require 'sequitur'
|
87
|
+
input_sequence = 'ababcabcdabcde'
|
29
88
|
grammar = Sequitur.build_from(input_sequence)
|
30
89
|
|
31
90
|
# To display the grammar rules on the console output
|
32
|
-
# We use a formatter
|
91
|
+
# We use a grammar formatter
|
33
92
|
formatter = Sequitur::Formatter::BaseText.new(STDOUT)
|
34
93
|
|
35
94
|
# Now render the rules. Each rule is displayed with the format:
|
36
95
|
# rule_id : a_sequence_grammar_symbols.
|
37
96
|
# Where:
|
38
|
-
# - rule_id is either 'start' or a name like '
|
97
|
+
# - rule_id is either 'start' or a name like 'Pxxxx' (xxxx is a sequential number)
|
39
98
|
# - a grammar symbol is either a terminal symbol
|
40
99
|
# (i.e. a character from the input) or a rule id
|
41
100
|
formatter.render(grammar.visitor)
|
42
101
|
|
43
102
|
# Rendered output is:
|
44
|
-
# start :
|
45
|
-
#
|
46
|
-
#
|
103
|
+
# start : P1 P2 P3 P3 e.
|
104
|
+
# P1 : a b.
|
105
|
+
# P2 : P1 c.
|
106
|
+
# P3 : P2 d.
|
47
107
|
```
|
48
108
|
|
109
|
+
|
110
|
+
|
49
111
|
### TODO: Add more documentation ###
|
50
112
|
|
51
113
|
|
data/lib/sequitur/constants.rb
CHANGED
@@ -0,0 +1,33 @@
|
|
1
|
+
module Sequitur
|
2
|
+
module Formatter
|
3
|
+
|
4
|
+
# Superclass for grammar formatters.
|
5
|
+
class BaseFormatter
|
6
|
+
# The IO output stream in which the formatter's result will be sent.
|
7
|
+
attr(:output)
|
8
|
+
|
9
|
+
# Constructor.
|
10
|
+
# [anIO] an output IO where the formatter's result will be placed.
|
11
|
+
def initialize(anIO)
|
12
|
+
@output = anIO
|
13
|
+
end
|
14
|
+
|
15
|
+
public
|
16
|
+
|
17
|
+
# Given a grammar or a grammar visitor, perform the visit
|
18
|
+
# and render the visit events in the output stream.
|
19
|
+
def render(aGrmOrVisitor)
|
20
|
+
aVisitor = if aGrmOrVisitor.kind_of?(GrammarVisitor)
|
21
|
+
aGrmOrVisitor
|
22
|
+
else
|
23
|
+
aGrmOrVisitor.visitor
|
24
|
+
end
|
25
|
+
|
26
|
+
aVisitor.subscribe(self)
|
27
|
+
aVisitor.start()
|
28
|
+
aVisitor.unsubscribe(self)
|
29
|
+
end
|
30
|
+
|
31
|
+
end # class
|
32
|
+
end # module
|
33
|
+
end # module
|
@@ -1,3 +1,5 @@
|
|
1
|
+
require_relative 'base_formatter'
|
2
|
+
|
1
3
|
module Sequitur
|
2
4
|
module Formatter
|
3
5
|
|
@@ -8,23 +10,16 @@ module Sequitur
|
|
8
10
|
# formatter = Sequitur::Formatter::BaseText.new(STDOUT)
|
9
11
|
# Render the grammar (through a visitor)
|
10
12
|
# formatter.run(some_grammar.visitor)
|
11
|
-
class BaseText
|
12
|
-
attr(:output)
|
13
|
+
class BaseText < BaseFormatter
|
13
14
|
|
14
15
|
# Constructor.
|
15
16
|
# [anIO]
|
16
17
|
def initialize(anIO)
|
17
|
-
|
18
|
+
super(anIO)
|
18
19
|
@prod_lookup = {}
|
19
20
|
end
|
20
21
|
|
21
22
|
public
|
22
|
-
|
23
|
-
def render(aVisitor)
|
24
|
-
aVisitor.subscribe(self)
|
25
|
-
aVisitor.start()
|
26
|
-
aVisitor.unsubscribe(self)
|
27
|
-
end
|
28
23
|
|
29
24
|
def before_grammar(aGrammar)
|
30
25
|
aGrammar.productions.each_with_index do |a_prod, index|
|
@@ -63,7 +58,7 @@ module Sequitur
|
|
63
58
|
|
64
59
|
def prod_name(aProduction)
|
65
60
|
prod_index = prod_lookup[aProduction]
|
66
|
-
name = (prod_index == 0) ? 'start' : "
|
61
|
+
name = (prod_index == 0) ? 'start' : "P#{prod_index}"
|
67
62
|
return name
|
68
63
|
end
|
69
64
|
|
@@ -1,24 +1,20 @@
|
|
1
|
+
require_relative 'base_formatter'
|
2
|
+
|
3
|
+
|
1
4
|
module Sequitur
|
2
5
|
module Formatter
|
3
|
-
class Debug
|
6
|
+
class Debug < BaseFormatter
|
4
7
|
attr(:indentation)
|
5
|
-
attr(:output)
|
6
8
|
|
7
9
|
# Constructor.
|
8
10
|
# [anIO]
|
9
11
|
def initialize(anIO)
|
12
|
+
super(anIO)
|
10
13
|
@indentation = 0
|
11
|
-
@output = anIO
|
12
14
|
end
|
13
15
|
|
14
16
|
public
|
15
17
|
|
16
|
-
def render(aVisitor)
|
17
|
-
aVisitor.subscribe(self)
|
18
|
-
aVisitor.start()
|
19
|
-
aVisitor.unsubscribe(self)
|
20
|
-
end
|
21
|
-
|
22
18
|
def before_grammar(_)
|
23
19
|
output_event(__method__, indentation)
|
24
20
|
indent
|
@@ -60,16 +60,29 @@ SNIPPET
|
|
60
60
|
expect(destination.string).to eq(expectations)
|
61
61
|
end
|
62
62
|
|
63
|
-
it 'should support events
|
63
|
+
it 'should support visit events with an explicit visitor' do
|
64
64
|
instance = BaseText.new(destination)
|
65
|
-
a_visitor = sample_grammar.visitor
|
65
|
+
a_visitor = sample_grammar.visitor # Use visitor explicitly
|
66
66
|
instance.render(a_visitor)
|
67
67
|
expectations =<<-SNIPPET
|
68
68
|
start :.
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
69
|
+
P1 : a.
|
70
|
+
P2 : b.
|
71
|
+
P3 : c.
|
72
|
+
P4 : P2 P3.
|
73
|
+
SNIPPET
|
74
|
+
expect(destination.string).to eq(expectations)
|
75
|
+
end
|
76
|
+
|
77
|
+
it 'should support visit events without an explicit visitor' do
|
78
|
+
instance = BaseText.new(destination)
|
79
|
+
instance.render(sample_grammar)
|
80
|
+
expectations =<<-SNIPPET
|
81
|
+
start :.
|
82
|
+
P1 : a.
|
83
|
+
P2 : b.
|
84
|
+
P3 : c.
|
85
|
+
P4 : P2 P3.
|
73
86
|
SNIPPET
|
74
87
|
expect(destination.string).to eq(expectations)
|
75
88
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequitur
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.01
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-09-
|
11
|
+
date: 2014-09-17 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -89,6 +89,7 @@ files:
|
|
89
89
|
- lib/sequitur/constants.rb
|
90
90
|
- lib/sequitur/digram.rb
|
91
91
|
- lib/sequitur/dynamic_grammar.rb
|
92
|
+
- lib/sequitur/formatter/base_formatter.rb
|
92
93
|
- lib/sequitur/formatter/base_text.rb
|
93
94
|
- lib/sequitur/formatter/debug.rb
|
94
95
|
- lib/sequitur/grammar_visitor.rb
|