sequitur 0.1.00 → 0.1.01

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- ZDAyNmJiYjE4YTQzNWMxZDI0ZWU2NTJhMGZiMTBiMmI0ZDg5MTBkOA==
4
+ NTQ5ODkzODJlYjNmZDBiODNiZTdiZTE4ZDFlNTljYWFhMTg5MzExYw==
5
5
  data.tar.gz: !binary |-
6
- MzI2YzIxMmZkYjExYTk1Yjc4ZTk0MDg0ZjczYjUxNDI4YWNkMWMxZg==
6
+ YzZkNWE1MTdhZTBiMWZmOTI1ZDhhMDJkM2QxYTU3ZDAxZDExZjk5MQ==
7
7
  !binary "U0hBNTEy":
8
8
  metadata.gz: !binary |-
9
- NzEwYTk3ZGFhOGU5ZDZkMWRlMWVmZWU5MDQxOTQ0YTk5ZDIzMGM4YjY1ODAw
10
- NjQyYmU0ODEwZTRhYzM4ZjVmMWQ3ZDgyNjdiNDkzNjMyZmExZjFmN2Q0MTk3
11
- ODE4NTQwYjBiZGQ5M2QyY2JlZTczMzIzMWI0YjUwYzVmNzkyNjg=
9
+ YWVmZTE0YWQ4MmEyMTI4YTFjYmU2MTZkYTZhOGUwZTM4YTZmMDQ2OTliMDky
10
+ NjVmMjdkYmVkMGY3MGEzMDBlM2RjMzNlODMzZDQ3NGY4NzRmNzNkNjNlOTI2
11
+ ZGFlMTkxMTNmZjA0MTI2MjM0MzhkMjM4MjUwMzg0ZDk5NDU0YTU=
12
12
  data.tar.gz: !binary |-
13
- MTYxOWNiMjg5MzRhY2JlOTdmYjJjZmI0YjVkYzA1MmIxMmY5NjQ4NDY2Zjc1
14
- NWJlY2MyY2Q4ZWE2OTFmZGFiZWU1YjZkMjNmMzYzY2ZmYmM2MTE3MWVjODNj
15
- ZDBlNzA3M2VjNmI3NTJhNTk5NGY0OGY4ODFlOGJhZDY1MDMyZmU=
13
+ MTNhOGQyOWNlNTQ4Y2RiOTcyYjU3ZDM3MGZmZGEzYjY2NjFkM2RiZmI3ZTZl
14
+ ZmRiM2JiNzU0MGVkMDUzYTVjM2M4MmM3MWQwYTgyNmQzYjA0MTdhNWI4Mjg2
15
+ ODAwYTc4M2JjZGQyOTUzOGM4MjA5MzZhYThlMmM5MWNjYjJhZmQ=
data/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ ### 0.1.01 / 2014-09-17
2
+ * [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
3
+ * [CHANGE] File `README.me`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
4
+ * [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
5
+ * [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
6
+ * [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
7
+
8
+
1
9
  ### 0.1.00 / 2014-09-16
2
10
  * [CHANGE] Version number bumped. Added grammar rendering through specialized formatters.
3
11
 
data/README.md CHANGED
@@ -9,43 +9,105 @@ _Ruby gem implementing the Sequitur algorithm_
9
9
 
10
10
 
11
11
  ### What is the Sequitur algorithm? ###
12
- [Sequitur home](http://sequitur.info/)
12
+ The following are good entry points to learn about the algorithm:
13
+ [Sequitur algorithm home](http://sequitur.info/)
13
14
  [Wikipedia](http://en.wikipedia.org/wiki/Sequitur_algorithm)
14
15
 
15
- Sequitur is an algorithm that generates a set of rules representing a sequence of input tokens.
16
- It detects repeated token patterns and can represent them in a compact way.
16
+ ### The theory in a nutshell ###
17
+ Given a sequence of input tokens (say, characters), the Sequitur algorithm
18
+ will represent that input sequence as a set of rules. As the algorithm detects
19
+ automatically repeated token patterns, the resulting rule set can encode repetitions in the input
20
+ in a very compact way.
21
+ Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
22
+
23
+ **Can you give a simple example?**
24
+ Sure. Let's begin with a very basic case. Assume that 'abcabcabc' is our input string.
25
+ Notice that it is the same as the text 'abc' repeated three times. The Sequitur algorithm captures
26
+ this repetition and will generate the two following rules:
27
+
28
+ ```
29
+ start : P1 P1 P1.
30
+ P1 : a b c.
31
+ ```
32
+
33
+ In plain English:
34
+ -The first rule (named start) always represents the whole input. Here, it indicates that the input
35
+ is three time the pattern encoded by the rule called P1.
36
+ -The second rule (named P1) represents the sequence a b c.
37
+
38
+ **Can you give another example?**
39
+ Yep. Assume this time that the input is 'ababcabcdabcde'.
40
+ Then Sequitur algorithm will generate the rule set:
41
+ ```
42
+ start : P1 P2 P3 P3 e.
43
+ P1 : a b.
44
+ P2 : P1 c.
45
+ P3 : P2 d.
46
+ ```
47
+
48
+ Translated in plain English:
49
+ - Rule (start) tells that the input consists of the sequence of P_1 P_2 P_3 patterns followed by the letter e.
50
+ - Rule (P1) represents the sequence 'ab'.
51
+ - Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
52
+ In other words, it represents the string 'abc'.
53
+ - Rule (P3) represents the pattern encoded by P2 then d. It is thus equivalent to 'abcd'.
54
+
55
+ **What is it used for?**
56
+ Sequitur can be used:
57
+ - As a lossless data compression algorithm (especially for structured text containing
58
+ repeated elements)
59
+ - To detect hierarchical structure in sequences (e.g. traces in program execution)
17
60
 
18
61
 
19
62
  ## Synopsis
20
63
 
64
+ **Time for a quick demo**
65
+
66
+ The following Ruby snippet show how to apply Sequitur on the input string from the last example above.
67
+
21
68
  ```ruby
22
69
 
23
70
  require 'sequitur' # Load the Sequitur library
24
71
 
25
- input_sequence = 'abcabdabcabd' # Let's analyze this string
72
+ input_sequence = 'ababcabcdabcde' # Let's analyze this string
26
73
 
27
- # The SEQUITUR algorithm will detect the repeated 'ab' pattern
28
- # and will generate a context-free grammar that represents the input string
74
+ # Run the Sequitur algorithm which will result in a grammar (=rule set)
75
+ grammar = Sequitur.build_from(input_sequence)
76
+ ````
77
+
78
+ The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
79
+ can you make good use of the algorithm's result.
80
+
81
+ The very first natural step is to be able to print out the (grammar) rules.
82
+ Here's how:
83
+
84
+
85
+ ```ruby
86
+ require 'sequitur'
87
+ input_sequence = 'ababcabcdabcde'
29
88
  grammar = Sequitur.build_from(input_sequence)
30
89
 
31
90
  # To display the grammar rules on the console output
32
- # We use a formatter
91
+ # We use a grammar formatter
33
92
  formatter = Sequitur::Formatter::BaseText.new(STDOUT)
34
93
 
35
94
  # Now render the rules. Each rule is displayed with the format:
36
95
  # rule_id : a_sequence_grammar_symbols.
37
96
  # Where:
38
- # - rule_id is either 'start' or a name like 'P_xxxx' (xxxx is a sequential number)
97
+ # - rule_id is either 'start' or a name like 'Pxxxx' (xxxx is a sequential number)
39
98
  # - a grammar symbol is either a terminal symbol
40
99
  # (i.e. a character from the input) or a rule id
41
100
  formatter.render(grammar.visitor)
42
101
 
43
102
  # Rendered output is:
44
- # start : P_2 P_2.
45
- # P_1 : a b.
46
- # P_2 : P_1 c P_1 d.
103
+ # start : P1 P2 P3 P3 e.
104
+ # P1 : a b.
105
+ # P2 : P1 c.
106
+ # P3 : P2 d.
47
107
  ```
48
108
 
109
+
110
+
49
111
  ### TODO: Add more documentation ###
50
112
 
51
113
 
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Sequitur # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.1.00'
6
+ Version = '0.1.01'
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = 'Ruby implementation of the Sequitur algorithm'
@@ -0,0 +1,33 @@
1
+ module Sequitur
2
+ module Formatter
3
+
4
+ # Superclass for grammar formatters.
5
+ class BaseFormatter
6
+ # The IO output stream in which the formatter's result will be sent.
7
+ attr(:output)
8
+
9
+ # Constructor.
10
+ # [anIO] an output IO where the formatter's result will be placed.
11
+ def initialize(anIO)
12
+ @output = anIO
13
+ end
14
+
15
+ public
16
+
17
+ # Given a grammar or a grammar visitor, perform the visit
18
+ # and render the visit events in the output stream.
19
+ def render(aGrmOrVisitor)
20
+ aVisitor = if aGrmOrVisitor.kind_of?(GrammarVisitor)
21
+ aGrmOrVisitor
22
+ else
23
+ aGrmOrVisitor.visitor
24
+ end
25
+
26
+ aVisitor.subscribe(self)
27
+ aVisitor.start()
28
+ aVisitor.unsubscribe(self)
29
+ end
30
+
31
+ end # class
32
+ end # module
33
+ end # module
@@ -1,3 +1,5 @@
1
+ require_relative 'base_formatter'
2
+
1
3
  module Sequitur
2
4
  module Formatter
3
5
 
@@ -8,23 +10,16 @@ module Sequitur
8
10
  # formatter = Sequitur::Formatter::BaseText.new(STDOUT)
9
11
  # Render the grammar (through a visitor)
10
12
  # formatter.run(some_grammar.visitor)
11
- class BaseText
12
- attr(:output)
13
+ class BaseText < BaseFormatter
13
14
 
14
15
  # Constructor.
15
16
  # [anIO]
16
17
  def initialize(anIO)
17
- @output = anIO
18
+ super(anIO)
18
19
  @prod_lookup = {}
19
20
  end
20
21
 
21
22
  public
22
-
23
- def render(aVisitor)
24
- aVisitor.subscribe(self)
25
- aVisitor.start()
26
- aVisitor.unsubscribe(self)
27
- end
28
23
 
29
24
  def before_grammar(aGrammar)
30
25
  aGrammar.productions.each_with_index do |a_prod, index|
@@ -63,7 +58,7 @@ module Sequitur
63
58
 
64
59
  def prod_name(aProduction)
65
60
  prod_index = prod_lookup[aProduction]
66
- name = (prod_index == 0) ? 'start' : "P_#{prod_index}"
61
+ name = (prod_index == 0) ? 'start' : "P#{prod_index}"
67
62
  return name
68
63
  end
69
64
 
@@ -1,24 +1,20 @@
1
+ require_relative 'base_formatter'
2
+
3
+
1
4
  module Sequitur
2
5
  module Formatter
3
- class Debug
6
+ class Debug < BaseFormatter
4
7
  attr(:indentation)
5
- attr(:output)
6
8
 
7
9
  # Constructor.
8
10
  # [anIO]
9
11
  def initialize(anIO)
12
+ super(anIO)
10
13
  @indentation = 0
11
- @output = anIO
12
14
  end
13
15
 
14
16
  public
15
17
 
16
- def render(aVisitor)
17
- aVisitor.subscribe(self)
18
- aVisitor.start()
19
- aVisitor.unsubscribe(self)
20
- end
21
-
22
18
  def before_grammar(_)
23
19
  output_event(__method__, indentation)
24
20
  indent
@@ -60,16 +60,29 @@ SNIPPET
60
60
  expect(destination.string).to eq(expectations)
61
61
  end
62
62
 
63
- it 'should support events of a non-empty grammar' do
63
+ it 'should support visit events with an explicit visitor' do
64
64
  instance = BaseText.new(destination)
65
- a_visitor = sample_grammar.visitor
65
+ a_visitor = sample_grammar.visitor # Use visitor explicitly
66
66
  instance.render(a_visitor)
67
67
  expectations =<<-SNIPPET
68
68
  start :.
69
- P_1 : a.
70
- P_2 : b.
71
- P_3 : c.
72
- P_4 : P_2 P_3.
69
+ P1 : a.
70
+ P2 : b.
71
+ P3 : c.
72
+ P4 : P2 P3.
73
+ SNIPPET
74
+ expect(destination.string).to eq(expectations)
75
+ end
76
+
77
+ it 'should support visit events without an explicit visitor' do
78
+ instance = BaseText.new(destination)
79
+ instance.render(sample_grammar)
80
+ expectations =<<-SNIPPET
81
+ start :.
82
+ P1 : a.
83
+ P2 : b.
84
+ P3 : c.
85
+ P4 : P2 P3.
73
86
  SNIPPET
74
87
  expect(destination.string).to eq(expectations)
75
88
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequitur
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.00
4
+ version: 0.1.01
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-09-16 00:00:00.000000000 Z
11
+ date: 2014-09-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -89,6 +89,7 @@ files:
89
89
  - lib/sequitur/constants.rb
90
90
  - lib/sequitur/digram.rb
91
91
  - lib/sequitur/dynamic_grammar.rb
92
+ - lib/sequitur/formatter/base_formatter.rb
92
93
  - lib/sequitur/formatter/base_text.rb
93
94
  - lib/sequitur/formatter/debug.rb
94
95
  - lib/sequitur/grammar_visitor.rb