sequitur 0.1.00 → 0.1.01

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- ZDAyNmJiYjE4YTQzNWMxZDI0ZWU2NTJhMGZiMTBiMmI0ZDg5MTBkOA==
4
+ NTQ5ODkzODJlYjNmZDBiODNiZTdiZTE4ZDFlNTljYWFhMTg5MzExYw==
5
5
  data.tar.gz: !binary |-
6
- MzI2YzIxMmZkYjExYTk1Yjc4ZTk0MDg0ZjczYjUxNDI4YWNkMWMxZg==
6
+ YzZkNWE1MTdhZTBiMWZmOTI1ZDhhMDJkM2QxYTU3ZDAxZDExZjk5MQ==
7
7
  !binary "U0hBNTEy":
8
8
  metadata.gz: !binary |-
9
- NzEwYTk3ZGFhOGU5ZDZkMWRlMWVmZWU5MDQxOTQ0YTk5ZDIzMGM4YjY1ODAw
10
- NjQyYmU0ODEwZTRhYzM4ZjVmMWQ3ZDgyNjdiNDkzNjMyZmExZjFmN2Q0MTk3
11
- ODE4NTQwYjBiZGQ5M2QyY2JlZTczMzIzMWI0YjUwYzVmNzkyNjg=
9
+ YWVmZTE0YWQ4MmEyMTI4YTFjYmU2MTZkYTZhOGUwZTM4YTZmMDQ2OTliMDky
10
+ NjVmMjdkYmVkMGY3MGEzMDBlM2RjMzNlODMzZDQ3NGY4NzRmNzNkNjNlOTI2
11
+ ZGFlMTkxMTNmZjA0MTI2MjM0MzhkMjM4MjUwMzg0ZDk5NDU0YTU=
12
12
  data.tar.gz: !binary |-
13
- MTYxOWNiMjg5MzRhY2JlOTdmYjJjZmI0YjVkYzA1MmIxMmY5NjQ4NDY2Zjc1
14
- NWJlY2MyY2Q4ZWE2OTFmZGFiZWU1YjZkMjNmMzYzY2ZmYmM2MTE3MWVjODNj
15
- ZDBlNzA3M2VjNmI3NTJhNTk5NGY0OGY4ODFlOGJhZDY1MDMyZmU=
13
+ MTNhOGQyOWNlNTQ4Y2RiOTcyYjU3ZDM3MGZmZGEzYjY2NjFkM2RiZmI3ZTZl
14
+ ZmRiM2JiNzU0MGVkMDUzYTVjM2M4MmM3MWQwYTgyNmQzYjA0MTdhNWI4Mjg2
15
+ ODAwYTc4M2JjZGQyOTUzOGM4MjA5MzZhYThlMmM5MWNjYjJhZmQ=
data/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ ### 0.1.01 / 2014-09-17
2
+ * [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
3
+ * [CHANGE] File `README.me`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
4
+ * [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
5
+ * [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
6
+ * [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
7
+
8
+
1
9
  ### 0.1.00 / 2014-09-16
2
10
  * [CHANGE] Version number bumped. Added grammar rendering through specialized formatters.
3
11
 
data/README.md CHANGED
@@ -9,43 +9,105 @@ _Ruby gem implementing the Sequitur algorithm_
9
9
 
10
10
 
11
11
  ### What is the Sequitur algorithm? ###
12
- [Sequitur home](http://sequitur.info/)
12
+ The following are good entry points to learn about the algorithm:
13
+ [Sequitur algorithm home](http://sequitur.info/)
13
14
  [Wikipedia](http://en.wikipedia.org/wiki/Sequitur_algorithm)
14
15
 
15
- Sequitur is an algorithm that generates a set of rules representing a sequence of input tokens.
16
- It detects repeated token patterns and can represent them in a compact way.
16
+ ### The theory in a nutshell ###
17
+ Given a sequence of input tokens (say, characters), the Sequitur algorithm
18
+ will represent that input sequence as a set of rules. As the algorithm detects
19
+ automatically repeated token patterns, the resulting rule set can encode repetitions in the input
20
+ in a very compact way.
21
+ Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
22
+
23
+ **Can you give a simple example?**
24
+ Sure. Let's begin with a very basic case. Assume that 'abcabcabc' is our input string.
25
+ Notice that it is the same as the text 'abc' repeated three times. The Sequitur algorithm captures
26
+ this repetition and will generate the two following rules:
27
+
28
+ ```
29
+ start : P1 P1 P1.
30
+ P1 : a b c.
31
+ ```
32
+
33
+ In plain English:
34
+ -The first rule (named start) always represents the whole input. Here, it indicates that the input
35
+ is three time the pattern encoded by the rule called P1.
36
+ -The second rule (named P1) represents the sequence a b c.
37
+
38
+ **Can you give another example?**
39
+ Yep. Assume this time that the input is 'ababcabcdabcde'.
40
+ Then Sequitur algorithm will generate the rule set:
41
+ ```
42
+ start : P1 P2 P3 P3 e.
43
+ P1 : a b.
44
+ P2 : P1 c.
45
+ P3 : P2 d.
46
+ ```
47
+
48
+ Translated in plain English:
49
+ - Rule (start) tells that the input consists of the sequence of P_1 P_2 P_3 patterns followed by the letter e.
50
+ - Rule (P1) represents the sequence 'ab'.
51
+ - Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
52
+ In other words, it represents the string 'abc'.
53
+ - Rule (P3) represents the pattern encoded by P2 then d. It is thus equivalent to 'abcd'.
54
+
55
+ **What is it used for?**
56
+ Sequitur can be used:
57
+ - As a lossless data compression algorithm (especially for structured text containing
58
+ repeated elements)
59
+ - To detect hierarchical structure in sequences (e.g. traces in program execution)
17
60
 
18
61
 
19
62
  ## Synopsis
20
63
 
64
+ **Time for a quick demo**
65
+
66
+ The following Ruby snippet show how to apply Sequitur on the input string from the last example above.
67
+
21
68
  ```ruby
22
69
 
23
70
  require 'sequitur' # Load the Sequitur library
24
71
 
25
- input_sequence = 'abcabdabcabd' # Let's analyze this string
72
+ input_sequence = 'ababcabcdabcde' # Let's analyze this string
26
73
 
27
- # The SEQUITUR algorithm will detect the repeated 'ab' pattern
28
- # and will generate a context-free grammar that represents the input string
74
+ # Run the Sequitur algorithm which will result in a grammar (=rule set)
75
+ grammar = Sequitur.build_from(input_sequence)
76
+ ````
77
+
78
+ The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
79
+ can you make good use of the algorithm's result.
80
+
81
+ The very first natural step is to be able to print out the (grammar) rules.
82
+ Here's how:
83
+
84
+
85
+ ```ruby
86
+ require 'sequitur'
87
+ input_sequence = 'ababcabcdabcde'
29
88
  grammar = Sequitur.build_from(input_sequence)
30
89
 
31
90
  # To display the grammar rules on the console output
32
- # We use a formatter
91
+ # We use a grammar formatter
33
92
  formatter = Sequitur::Formatter::BaseText.new(STDOUT)
34
93
 
35
94
  # Now render the rules. Each rule is displayed with the format:
36
95
  # rule_id : a_sequence_grammar_symbols.
37
96
  # Where:
38
- # - rule_id is either 'start' or a name like 'P_xxxx' (xxxx is a sequential number)
97
+ # - rule_id is either 'start' or a name like 'Pxxxx' (xxxx is a sequential number)
39
98
  # - a grammar symbol is either a terminal symbol
40
99
  # (i.e. a character from the input) or a rule id
41
100
  formatter.render(grammar.visitor)
42
101
 
43
102
  # Rendered output is:
44
- # start : P_2 P_2.
45
- # P_1 : a b.
46
- # P_2 : P_1 c P_1 d.
103
+ # start : P1 P2 P3 P3 e.
104
+ # P1 : a b.
105
+ # P2 : P1 c.
106
+ # P3 : P2 d.
47
107
  ```
48
108
 
109
+
110
+
49
111
  ### TODO: Add more documentation ###
50
112
 
51
113
 
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Sequitur # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.1.00'
6
+ Version = '0.1.01'
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = 'Ruby implementation of the Sequitur algorithm'
@@ -0,0 +1,33 @@
1
+ module Sequitur
2
+ module Formatter
3
+
4
+ # Superclass for grammar formatters.
5
+ class BaseFormatter
6
+ # The IO output stream in which the formatter's result will be sent.
7
+ attr(:output)
8
+
9
+ # Constructor.
10
+ # [anIO] an output IO where the formatter's result will be placed.
11
+ def initialize(anIO)
12
+ @output = anIO
13
+ end
14
+
15
+ public
16
+
17
+ # Given a grammar or a grammar visitor, perform the visit
18
+ # and render the visit events in the output stream.
19
+ def render(aGrmOrVisitor)
20
+ aVisitor = if aGrmOrVisitor.kind_of?(GrammarVisitor)
21
+ aGrmOrVisitor
22
+ else
23
+ aGrmOrVisitor.visitor
24
+ end
25
+
26
+ aVisitor.subscribe(self)
27
+ aVisitor.start()
28
+ aVisitor.unsubscribe(self)
29
+ end
30
+
31
+ end # class
32
+ end # module
33
+ end # module
@@ -1,3 +1,5 @@
1
+ require_relative 'base_formatter'
2
+
1
3
  module Sequitur
2
4
  module Formatter
3
5
 
@@ -8,23 +10,16 @@ module Sequitur
8
10
  # formatter = Sequitur::Formatter::BaseText.new(STDOUT)
9
11
  # Render the grammar (through a visitor)
10
12
  # formatter.run(some_grammar.visitor)
11
- class BaseText
12
- attr(:output)
13
+ class BaseText < BaseFormatter
13
14
 
14
15
  # Constructor.
15
16
  # [anIO]
16
17
  def initialize(anIO)
17
- @output = anIO
18
+ super(anIO)
18
19
  @prod_lookup = {}
19
20
  end
20
21
 
21
22
  public
22
-
23
- def render(aVisitor)
24
- aVisitor.subscribe(self)
25
- aVisitor.start()
26
- aVisitor.unsubscribe(self)
27
- end
28
23
 
29
24
  def before_grammar(aGrammar)
30
25
  aGrammar.productions.each_with_index do |a_prod, index|
@@ -63,7 +58,7 @@ module Sequitur
63
58
 
64
59
  def prod_name(aProduction)
65
60
  prod_index = prod_lookup[aProduction]
66
- name = (prod_index == 0) ? 'start' : "P_#{prod_index}"
61
+ name = (prod_index == 0) ? 'start' : "P#{prod_index}"
67
62
  return name
68
63
  end
69
64
 
@@ -1,24 +1,20 @@
1
+ require_relative 'base_formatter'
2
+
3
+
1
4
  module Sequitur
2
5
  module Formatter
3
- class Debug
6
+ class Debug < BaseFormatter
4
7
  attr(:indentation)
5
- attr(:output)
6
8
 
7
9
  # Constructor.
8
10
  # [anIO]
9
11
  def initialize(anIO)
12
+ super(anIO)
10
13
  @indentation = 0
11
- @output = anIO
12
14
  end
13
15
 
14
16
  public
15
17
 
16
- def render(aVisitor)
17
- aVisitor.subscribe(self)
18
- aVisitor.start()
19
- aVisitor.unsubscribe(self)
20
- end
21
-
22
18
  def before_grammar(_)
23
19
  output_event(__method__, indentation)
24
20
  indent
@@ -60,16 +60,29 @@ SNIPPET
60
60
  expect(destination.string).to eq(expectations)
61
61
  end
62
62
 
63
- it 'should support events of a non-empty grammar' do
63
+ it 'should support visit events with an explicit visitor' do
64
64
  instance = BaseText.new(destination)
65
- a_visitor = sample_grammar.visitor
65
+ a_visitor = sample_grammar.visitor # Use visitor explicitly
66
66
  instance.render(a_visitor)
67
67
  expectations =<<-SNIPPET
68
68
  start :.
69
- P_1 : a.
70
- P_2 : b.
71
- P_3 : c.
72
- P_4 : P_2 P_3.
69
+ P1 : a.
70
+ P2 : b.
71
+ P3 : c.
72
+ P4 : P2 P3.
73
+ SNIPPET
74
+ expect(destination.string).to eq(expectations)
75
+ end
76
+
77
+ it 'should support visit events without an explicit visitor' do
78
+ instance = BaseText.new(destination)
79
+ instance.render(sample_grammar)
80
+ expectations =<<-SNIPPET
81
+ start :.
82
+ P1 : a.
83
+ P2 : b.
84
+ P3 : c.
85
+ P4 : P2 P3.
73
86
  SNIPPET
74
87
  expect(destination.string).to eq(expectations)
75
88
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequitur
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.00
4
+ version: 0.1.01
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-09-16 00:00:00.000000000 Z
11
+ date: 2014-09-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -89,6 +89,7 @@ files:
89
89
  - lib/sequitur/constants.rb
90
90
  - lib/sequitur/digram.rb
91
91
  - lib/sequitur/dynamic_grammar.rb
92
+ - lib/sequitur/formatter/base_formatter.rb
92
93
  - lib/sequitur/formatter/base_text.rb
93
94
  - lib/sequitur/formatter/debug.rb
94
95
  - lib/sequitur/grammar_visitor.rb