sequitur 0.1.00 → 0.1.01
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +8 -8
- data/CHANGELOG.md +8 -0
- data/README.md +73 -11
- data/lib/sequitur/constants.rb +1 -1
- data/lib/sequitur/formatter/base_formatter.rb +33 -0
- data/lib/sequitur/formatter/base_text.rb +5 -10
- data/lib/sequitur/formatter/debug.rb +5 -9
- data/spec/sequitur/formatter/base_text_spec.rb +19 -6
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
NTQ5ODkzODJlYjNmZDBiODNiZTdiZTE4ZDFlNTljYWFhMTg5MzExYw==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
YzZkNWE1MTdhZTBiMWZmOTI1ZDhhMDJkM2QxYTU3ZDAxZDExZjk5MQ==
|
7
7
|
!binary "U0hBNTEy":
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
YWVmZTE0YWQ4MmEyMTI4YTFjYmU2MTZkYTZhOGUwZTM4YTZmMDQ2OTliMDky
|
10
|
+
NjVmMjdkYmVkMGY3MGEzMDBlM2RjMzNlODMzZDQ3NGY4NzRmNzNkNjNlOTI2
|
11
|
+
ZGFlMTkxMTNmZjA0MTI2MjM0MzhkMjM4MjUwMzg0ZDk5NDU0YTU=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
MTNhOGQyOWNlNTQ4Y2RiOTcyYjU3ZDM3MGZmZGEzYjY2NjFkM2RiZmI3ZTZl
|
14
|
+
ZmRiM2JiNzU0MGVkMDUzYTVjM2M4MmM3MWQwYTgyNmQzYjA0MTdhNWI4Mjg2
|
15
|
+
ODAwYTc4M2JjZGQyOTUzOGM4MjA5MzZhYThlMmM5MWNjYjJhZmQ=
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,11 @@
|
|
1
|
+
### 0.1.01 / 2014-09-17
|
2
|
+
* [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
|
3
|
+
* [CHANGE] File `README.me`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
|
4
|
+
* [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
|
5
|
+
* [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
|
6
|
+
* [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
|
7
|
+
|
8
|
+
|
1
9
|
### 0.1.00 / 2014-09-16
|
2
10
|
* [CHANGE] Version number bumped. Added grammar rendering through specialized formatters.
|
3
11
|
|
data/README.md
CHANGED
@@ -9,43 +9,105 @@ _Ruby gem implementing the Sequitur algorithm_
|
|
9
9
|
|
10
10
|
|
11
11
|
### What is the Sequitur algorithm? ###
|
12
|
-
|
12
|
+
The following are good entry points to learn about the algorithm:
|
13
|
+
[Sequitur algorithm home](http://sequitur.info/)
|
13
14
|
[Wikipedia](http://en.wikipedia.org/wiki/Sequitur_algorithm)
|
14
15
|
|
15
|
-
|
16
|
-
|
16
|
+
### The theory in a nutshell ###
|
17
|
+
Given a sequence of input tokens (say, characters), the Sequitur algorithm
|
18
|
+
will represent that input sequence as a set of rules. As the algorithm detects
|
19
|
+
automatically repeated token patterns, the resulting rule set can encode repetitions in the input
|
20
|
+
in a very compact way.
|
21
|
+
Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
|
22
|
+
|
23
|
+
**Can you give a simple example?**
|
24
|
+
Sure. Let's begin with a very basic case. Assume that 'abcabcabc' is our input string.
|
25
|
+
Notice that it is the same as the text 'abc' repeated three times. The Sequitur algorithm captures
|
26
|
+
this repetition and will generate the two following rules:
|
27
|
+
|
28
|
+
```
|
29
|
+
start : P1 P1 P1.
|
30
|
+
P1 : a b c.
|
31
|
+
```
|
32
|
+
|
33
|
+
In plain English:
|
34
|
+
-The first rule (named start) always represents the whole input. Here, it indicates that the input
|
35
|
+
is three time the pattern encoded by the rule called P1.
|
36
|
+
-The second rule (named P1) represents the sequence a b c.
|
37
|
+
|
38
|
+
**Can you give another example?**
|
39
|
+
Yep. Assume this time that the input is 'ababcabcdabcde'.
|
40
|
+
Then Sequitur algorithm will generate the rule set:
|
41
|
+
```
|
42
|
+
start : P1 P2 P3 P3 e.
|
43
|
+
P1 : a b.
|
44
|
+
P2 : P1 c.
|
45
|
+
P3 : P2 d.
|
46
|
+
```
|
47
|
+
|
48
|
+
Translated in plain English:
|
49
|
+
- Rule (start) tells that the input consists of the sequence of P_1 P_2 P_3 patterns followed by the letter e.
|
50
|
+
- Rule (P1) represents the sequence 'ab'.
|
51
|
+
- Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
|
52
|
+
In other words, it represents the string 'abc'.
|
53
|
+
- Rule (P3) represents the pattern encoded by P2 then d. It is thus equivalent to 'abcd'.
|
54
|
+
|
55
|
+
**What is it used for?**
|
56
|
+
Sequitur can be used:
|
57
|
+
- As a lossless data compression algorithm (especially for structured text containing
|
58
|
+
repeated elements)
|
59
|
+
- To detect hierarchical structure in sequences (e.g. traces in program execution)
|
17
60
|
|
18
61
|
|
19
62
|
## Synopsis
|
20
63
|
|
64
|
+
**Time for a quick demo**
|
65
|
+
|
66
|
+
The following Ruby snippet show how to apply Sequitur on the input string from the last example above.
|
67
|
+
|
21
68
|
```ruby
|
22
69
|
|
23
70
|
require 'sequitur' # Load the Sequitur library
|
24
71
|
|
25
|
-
input_sequence = '
|
72
|
+
input_sequence = 'ababcabcdabcde' # Let's analyze this string
|
26
73
|
|
27
|
-
#
|
28
|
-
|
74
|
+
# Run the Sequitur algorithm which will result in a grammar (=rule set)
|
75
|
+
grammar = Sequitur.build_from(input_sequence)
|
76
|
+
````
|
77
|
+
|
78
|
+
The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
|
79
|
+
can you make good use of the algorithm's result.
|
80
|
+
|
81
|
+
The very first natural step is to be able to print out the (grammar) rules.
|
82
|
+
Here's how:
|
83
|
+
|
84
|
+
|
85
|
+
```ruby
|
86
|
+
require 'sequitur'
|
87
|
+
input_sequence = 'ababcabcdabcde'
|
29
88
|
grammar = Sequitur.build_from(input_sequence)
|
30
89
|
|
31
90
|
# To display the grammar rules on the console output
|
32
|
-
# We use a formatter
|
91
|
+
# We use a grammar formatter
|
33
92
|
formatter = Sequitur::Formatter::BaseText.new(STDOUT)
|
34
93
|
|
35
94
|
# Now render the rules. Each rule is displayed with the format:
|
36
95
|
# rule_id : a_sequence_grammar_symbols.
|
37
96
|
# Where:
|
38
|
-
# - rule_id is either 'start' or a name like '
|
97
|
+
# - rule_id is either 'start' or a name like 'Pxxxx' (xxxx is a sequential number)
|
39
98
|
# - a grammar symbol is either a terminal symbol
|
40
99
|
# (i.e. a character from the input) or a rule id
|
41
100
|
formatter.render(grammar.visitor)
|
42
101
|
|
43
102
|
# Rendered output is:
|
44
|
-
# start :
|
45
|
-
#
|
46
|
-
#
|
103
|
+
# start : P1 P2 P3 P3 e.
|
104
|
+
# P1 : a b.
|
105
|
+
# P2 : P1 c.
|
106
|
+
# P3 : P2 d.
|
47
107
|
```
|
48
108
|
|
109
|
+
|
110
|
+
|
49
111
|
### TODO: Add more documentation ###
|
50
112
|
|
51
113
|
|
data/lib/sequitur/constants.rb
CHANGED
@@ -0,0 +1,33 @@
|
|
1
|
+
module Sequitur
|
2
|
+
module Formatter
|
3
|
+
|
4
|
+
# Superclass for grammar formatters.
|
5
|
+
class BaseFormatter
|
6
|
+
# The IO output stream in which the formatter's result will be sent.
|
7
|
+
attr(:output)
|
8
|
+
|
9
|
+
# Constructor.
|
10
|
+
# [anIO] an output IO where the formatter's result will be placed.
|
11
|
+
def initialize(anIO)
|
12
|
+
@output = anIO
|
13
|
+
end
|
14
|
+
|
15
|
+
public
|
16
|
+
|
17
|
+
# Given a grammar or a grammar visitor, perform the visit
|
18
|
+
# and render the visit events in the output stream.
|
19
|
+
def render(aGrmOrVisitor)
|
20
|
+
aVisitor = if aGrmOrVisitor.kind_of?(GrammarVisitor)
|
21
|
+
aGrmOrVisitor
|
22
|
+
else
|
23
|
+
aGrmOrVisitor.visitor
|
24
|
+
end
|
25
|
+
|
26
|
+
aVisitor.subscribe(self)
|
27
|
+
aVisitor.start()
|
28
|
+
aVisitor.unsubscribe(self)
|
29
|
+
end
|
30
|
+
|
31
|
+
end # class
|
32
|
+
end # module
|
33
|
+
end # module
|
@@ -1,3 +1,5 @@
|
|
1
|
+
require_relative 'base_formatter'
|
2
|
+
|
1
3
|
module Sequitur
|
2
4
|
module Formatter
|
3
5
|
|
@@ -8,23 +10,16 @@ module Sequitur
|
|
8
10
|
# formatter = Sequitur::Formatter::BaseText.new(STDOUT)
|
9
11
|
# Render the grammar (through a visitor)
|
10
12
|
# formatter.run(some_grammar.visitor)
|
11
|
-
class BaseText
|
12
|
-
attr(:output)
|
13
|
+
class BaseText < BaseFormatter
|
13
14
|
|
14
15
|
# Constructor.
|
15
16
|
# [anIO]
|
16
17
|
def initialize(anIO)
|
17
|
-
|
18
|
+
super(anIO)
|
18
19
|
@prod_lookup = {}
|
19
20
|
end
|
20
21
|
|
21
22
|
public
|
22
|
-
|
23
|
-
def render(aVisitor)
|
24
|
-
aVisitor.subscribe(self)
|
25
|
-
aVisitor.start()
|
26
|
-
aVisitor.unsubscribe(self)
|
27
|
-
end
|
28
23
|
|
29
24
|
def before_grammar(aGrammar)
|
30
25
|
aGrammar.productions.each_with_index do |a_prod, index|
|
@@ -63,7 +58,7 @@ module Sequitur
|
|
63
58
|
|
64
59
|
def prod_name(aProduction)
|
65
60
|
prod_index = prod_lookup[aProduction]
|
66
|
-
name = (prod_index == 0) ? 'start' : "
|
61
|
+
name = (prod_index == 0) ? 'start' : "P#{prod_index}"
|
67
62
|
return name
|
68
63
|
end
|
69
64
|
|
@@ -1,24 +1,20 @@
|
|
1
|
+
require_relative 'base_formatter'
|
2
|
+
|
3
|
+
|
1
4
|
module Sequitur
|
2
5
|
module Formatter
|
3
|
-
class Debug
|
6
|
+
class Debug < BaseFormatter
|
4
7
|
attr(:indentation)
|
5
|
-
attr(:output)
|
6
8
|
|
7
9
|
# Constructor.
|
8
10
|
# [anIO]
|
9
11
|
def initialize(anIO)
|
12
|
+
super(anIO)
|
10
13
|
@indentation = 0
|
11
|
-
@output = anIO
|
12
14
|
end
|
13
15
|
|
14
16
|
public
|
15
17
|
|
16
|
-
def render(aVisitor)
|
17
|
-
aVisitor.subscribe(self)
|
18
|
-
aVisitor.start()
|
19
|
-
aVisitor.unsubscribe(self)
|
20
|
-
end
|
21
|
-
|
22
18
|
def before_grammar(_)
|
23
19
|
output_event(__method__, indentation)
|
24
20
|
indent
|
@@ -60,16 +60,29 @@ SNIPPET
|
|
60
60
|
expect(destination.string).to eq(expectations)
|
61
61
|
end
|
62
62
|
|
63
|
-
it 'should support events
|
63
|
+
it 'should support visit events with an explicit visitor' do
|
64
64
|
instance = BaseText.new(destination)
|
65
|
-
a_visitor = sample_grammar.visitor
|
65
|
+
a_visitor = sample_grammar.visitor # Use visitor explicitly
|
66
66
|
instance.render(a_visitor)
|
67
67
|
expectations =<<-SNIPPET
|
68
68
|
start :.
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
69
|
+
P1 : a.
|
70
|
+
P2 : b.
|
71
|
+
P3 : c.
|
72
|
+
P4 : P2 P3.
|
73
|
+
SNIPPET
|
74
|
+
expect(destination.string).to eq(expectations)
|
75
|
+
end
|
76
|
+
|
77
|
+
it 'should support visit events without an explicit visitor' do
|
78
|
+
instance = BaseText.new(destination)
|
79
|
+
instance.render(sample_grammar)
|
80
|
+
expectations =<<-SNIPPET
|
81
|
+
start :.
|
82
|
+
P1 : a.
|
83
|
+
P2 : b.
|
84
|
+
P3 : c.
|
85
|
+
P4 : P2 P3.
|
73
86
|
SNIPPET
|
74
87
|
expect(destination.string).to eq(expectations)
|
75
88
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequitur
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.01
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-09-
|
11
|
+
date: 2014-09-17 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -89,6 +89,7 @@ files:
|
|
89
89
|
- lib/sequitur/constants.rb
|
90
90
|
- lib/sequitur/digram.rb
|
91
91
|
- lib/sequitur/dynamic_grammar.rb
|
92
|
+
- lib/sequitur/formatter/base_formatter.rb
|
92
93
|
- lib/sequitur/formatter/base_text.rb
|
93
94
|
- lib/sequitur/formatter/debug.rb
|
94
95
|
- lib/sequitur/grammar_visitor.rb
|