sequitur 0.1.01 → 0.1.02

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- NTQ5ODkzODJlYjNmZDBiODNiZTdiZTE4ZDFlNTljYWFhMTg5MzExYw==
4
+ YTdkNzZiNTc1NjBkM2M0MDlhZDI1M2MyNTFhODJhZGI1MjFlYWI2MQ==
5
5
  data.tar.gz: !binary |-
6
- YzZkNWE1MTdhZTBiMWZmOTI1ZDhhMDJkM2QxYTU3ZDAxZDExZjk5MQ==
6
+ MmEzZGRlNTI2M2U3ZmQwYmY3MTA1MmE0MDkzMGQ0ZjBmZDJlYTRmMQ==
7
7
  !binary "U0hBNTEy":
8
8
  metadata.gz: !binary |-
9
- YWVmZTE0YWQ4MmEyMTI4YTFjYmU2MTZkYTZhOGUwZTM4YTZmMDQ2OTliMDky
10
- NjVmMjdkYmVkMGY3MGEzMDBlM2RjMzNlODMzZDQ3NGY4NzRmNzNkNjNlOTI2
11
- ZGFlMTkxMTNmZjA0MTI2MjM0MzhkMjM4MjUwMzg0ZDk5NDU0YTU=
9
+ MjUzMmU0YTQ4MzQ2NmVmMWU2YWQzMTkwZDNiZjM3MjgyOTFlMmRmZDJmMmJi
10
+ NmM5YjMxMjA0YzM5OGFiOGRiYjBmYTc2M2YyN2NiNjJiMGRlYjJkMmMxMThk
11
+ ZGU1MDlhYTBkZDc3YTEwMDAwNmQ0YTZlOTQyZGM5YmFmNTRjNmM=
12
12
  data.tar.gz: !binary |-
13
- MTNhOGQyOWNlNTQ4Y2RiOTcyYjU3ZDM3MGZmZGEzYjY2NjFkM2RiZmI3ZTZl
14
- ZmRiM2JiNzU0MGVkMDUzYTVjM2M4MmM3MWQwYTgyNmQzYjA0MTdhNWI4Mjg2
15
- ODAwYTc4M2JjZGQyOTUzOGM4MjA5MzZhYThlMmM5MWNjYjJhZmQ=
13
+ NGU2NjY2Yzc2ZmQ4NDFlN2E4MGVlYTUwMDg4NjgwYzBiYjk0ZjM5NGY4MTg4
14
+ NTI5NTExOTQzMWY1YzhiNWM4ZjM1OWQ5YjM1MjViZWVlYWRlMWU5NjcyNDNk
15
+ MzAwMzZhM2NlZGE1M2MzYTYyOGZmODkyMWE4YjA0NTE3MTk4NjA=
data/CHANGELOG.md CHANGED
@@ -1,6 +1,10 @@
1
+ ### 0.1.02 / 2014-09-18
2
+ * [CHANGE] File `README.md`: expanded introductory text.
3
+ * [CHANGE] File `sequitur.gemspec` : expanded gem description in the specification.
4
+
1
5
  ### 0.1.01 / 2014-09-17
2
6
  * [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
3
- * [CHANGE] File `README.me`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
7
+ * [CHANGE] File `README.md`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
4
8
  * [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
5
9
  * [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
6
10
  * [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
data/README.md CHANGED
@@ -15,7 +15,7 @@ The following are good entry points to learn about the algorithm:
15
15
 
16
16
  ### The theory in a nutshell ###
17
17
  Given a sequence of input tokens (say, characters), the Sequitur algorithm
18
- will represent that input sequence as a set of rules. As the algorithm detects
18
+ will represent that input sequence as a set of rules. As the algorithm detects
19
19
  automatically repeated token patterns, the resulting rule set can encode repetitions in the input
20
20
  in a very compact way.
21
21
  Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
@@ -46,7 +46,7 @@ P3 : P2 d.
46
46
  ```
47
47
 
48
48
  Translated in plain English:
49
- - Rule (start) tells that the input consists of the sequence of P_1 P_2 P_3 patterns followed by the letter e.
49
+ - Rule (start) tells that the input consists of the sequence of P1 P2 P3 patterns followed by the letter e.
50
50
  - Rule (P1) represents the sequence 'ab'.
51
51
  - Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
52
52
  In other words, it represents the string 'abc'.
@@ -78,6 +78,7 @@ The following Ruby snippet show how to apply Sequitur on the input string from t
78
78
  The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
79
79
  can you make good use of the algorithm's result.
80
80
 
81
+ **Printing the resulting rules**
81
82
  The very first natural step is to be able to print out the (grammar) rules.
82
83
  Here's how:
83
84
 
@@ -106,6 +107,87 @@ Here's how:
106
107
  # P3 : P2 d.
107
108
  ```
108
109
 
110
+ ## Understanding the algorithm's results
111
+ The Sequitur algorithm generates a -simplified- context-free grammar, therefore we dedicate this section
112
+ to the terminology about context-free grammars. As the Internet provides tons of information can be found
113
+ on the subject, we limit ourselves to the minimal terminology of interest when using the sequitur gem.
114
+
115
+ First of all, what is a **grammar**? To simplify the matter, one can see a grammar as a set of
116
+ grammar rules. These rules are called production rules or more briefly **productions**.
117
+
118
+ In a context-free grammar, productions have the form:
119
+ ````
120
+ P : body.
121
+ ```
122
+
123
+ Where:
124
+ - The colon ':' character separates the head (= left-hand side) and the body (right-hand side, *rhs* in short)
125
+ of the rule.
126
+ - The left-hand side consists just of one symbol, P. P is a categorized as a *nonterminal symbol* and for our purposes
127
+ a nonterminal symbol can be seen as the "name" of the production. By contrast, a terminal symbol is just one element
128
+ from the input sequence (symbols as defined in formal grammar theory shouldn't be confused with Ruby's `Symbol` class).
129
+ - the body is a sequence -possibly empty- of *symbols* (terminal or nonterminal).
130
+
131
+ Basically, a production rule tells that P is equivalent to the sequence of symbols found in the
132
+ right-hand side of the production. A nonterminal symbol that appears in the rhs of a production can be
133
+ seen as a reference to the production with same name.
134
+
135
+
136
+ ## The Sequitur API
137
+
138
+ Recall the above example: a single call to the `Sequitur#build_from` factory method
139
+ suffices to construct a grammar object.
140
+
141
+ ```ruby
142
+ require 'sequitur'
143
+
144
+ input_sequence = 'ababcabcdabcde'
145
+ grammar = Sequitur.build_from(input_sequence)
146
+ ```
147
+
148
+ The return value `grammar` is a `Sequitur::SequiturGrammar` instance.
149
+
150
+ Unsurprisingly, the `Sequitur::SequiturGrammar` class defines an accessor method called 'productions'
151
+ that returns the productions of the grammar as an array of `Sequitur::Production` objects.
152
+
153
+ ```ruby
154
+ # Count the number of productions in the grammar
155
+ puts grammar.productions.size # => 4
156
+
157
+ # Retrieve all productions of the grammar
158
+ all_prods = grammar.productions
159
+
160
+ # Retrieve the start production
161
+ start_prod = grammar.production[0]
162
+ ```
163
+
164
+ Once we have a grip on a production, it is easy to access its right-hand side through the `Production#rhs` method.
165
+ It returns an array of symbols.
166
+
167
+ ```ruby
168
+ # ...Continuing the same example
169
+ # Retrieve the right-hand side of the production
170
+ prod_body = start_prod.rhs # Return an Array object
171
+ ```
172
+
173
+ The RHS of a production is a sequence (i.e. Array) of symbols.
174
+ How are the grammar symbols implemented?
175
+ -Terminal symbols are directly originating from the input sequence. They are inserted "as is" in the
176
+ RHS. For instance, if the input sequence consists of integer values (i.e. Finum instances), then they
177
+ will be inserted in the RHS of productions.
178
+ -Non-terminal symbols are implemented as `Sequitur::ProductionRef` objects.
179
+
180
+ A ProductionRef is reference to a Production object. The latter one can be accessed through the `ProductionRef#production` method.
181
+
182
+
183
+ ### Installation ###
184
+ The sequitur gem installation is fairly standard.
185
+ If your project has a `Gemfile` file, add `sequitur` to it. Otherwise, install the gem like this:
186
+
187
+ ```bash
188
+ $[sudo] gem install sequitur
189
+ ```
190
+
109
191
 
110
192
 
111
193
  ### TODO: Add more documentation ###
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Sequitur # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.1.01'
6
+ Version = '0.1.02'
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = 'Ruby implementation of the Sequitur algorithm'
@@ -17,17 +17,19 @@ module Sequitur
17
17
  # Given a grammar or a grammar visitor, perform the visit
18
18
  # and render the visit events in the output stream.
19
19
  def render(aGrmOrVisitor)
20
- aVisitor = if aGrmOrVisitor.kind_of?(GrammarVisitor)
21
- aGrmOrVisitor
20
+ if aGrmOrVisitor.kind_of?(GrammarVisitor)
21
+ a_visitor = aGrmOrVisitor
22
22
  else
23
- aGrmOrVisitor.visitor
23
+ a_visitor = aGrmOrVisitor.visitor
24
24
  end
25
25
 
26
- aVisitor.subscribe(self)
27
- aVisitor.start()
28
- aVisitor.unsubscribe(self)
26
+ a_visitor.subscribe(self)
27
+ a_visitor.start
28
+ a_visitor.unsubscribe(self)
29
29
  end
30
30
 
31
31
  end # class
32
32
  end # module
33
- end # module
33
+ end # module
34
+
35
+ # End of file
@@ -22,7 +22,7 @@ class GrammarVisitor
22
22
  end
23
23
 
24
24
  def unsubscribe(aSubscriber)
25
- subscribers.delete_if { |entry| entry == aSubscriber}
25
+ subscribers.delete_if { |entry| entry == aSubscriber }
26
26
  end
27
27
 
28
28
  # The signal to start the visit.
@@ -66,6 +66,7 @@ class GrammarVisitor
66
66
  end
67
67
 
68
68
  private
69
+
69
70
  def broadcast(msg, *args)
70
71
  subscribers.each do |a_subscriber|
71
72
  next unless a_subscriber.respond_to?(msg)
@@ -46,4 +46,3 @@ end # describe
46
46
  end # module
47
47
 
48
48
  # End of file
49
-
@@ -117,7 +117,7 @@ describe DynamicGrammar do
117
117
  a_visitor.subscribe(fake_formatter)
118
118
 
119
119
  expect(fake_formatter).to receive(:before_grammar).with(subject).ordered
120
- expect(fake_formatter).to receive(:before_production).with(subject.root).ordered
120
+ expect(fake_formatter).to receive(:before_production).with(subject.root)
121
121
  expect(fake_formatter).to receive(:before_rhs).with([]).ordered
122
122
  expect(fake_formatter).to receive(:after_rhs).with([]).ordered
123
123
  expect(fake_formatter).to receive(:after_production).with(subject.root)
@@ -41,7 +41,7 @@ describe BaseText do
41
41
  expect { BaseText.new(StringIO.new('', 'w')) }.not_to raise_error
42
42
  end
43
43
 
44
- it "should know its output destination" do
44
+ it 'should know its output destination' do
45
45
  instance = BaseText.new(destination)
46
46
  expect(instance.output).to eq(destination)
47
47
  end
@@ -54,7 +54,7 @@ describe BaseText do
54
54
  instance = BaseText.new(destination)
55
55
  a_visitor = empty_grammar.visitor
56
56
  instance.render(a_visitor)
57
- expectations =<<-SNIPPET
57
+ expectations = <<-SNIPPET
58
58
  start :.
59
59
  SNIPPET
60
60
  expect(destination.string).to eq(expectations)
@@ -64,7 +64,7 @@ SNIPPET
64
64
  instance = BaseText.new(destination)
65
65
  a_visitor = sample_grammar.visitor # Use visitor explicitly
66
66
  instance.render(a_visitor)
67
- expectations =<<-SNIPPET
67
+ expectations = <<-SNIPPET
68
68
  start :.
69
69
  P1 : a.
70
70
  P2 : b.
@@ -77,7 +77,7 @@ SNIPPET
77
77
  it 'should support visit events without an explicit visitor' do
78
78
  instance = BaseText.new(destination)
79
79
  instance.render(sample_grammar)
80
- expectations =<<-SNIPPET
80
+ expectations = <<-SNIPPET
81
81
  start :.
82
82
  P1 : a.
83
83
  P2 : b.
@@ -92,4 +92,4 @@ end # describe
92
92
  end # module
93
93
  end # module
94
94
 
95
- # End of file
95
+ # End of file
@@ -41,7 +41,7 @@ describe Debug do
41
41
  expect { Debug.new(StringIO.new('', 'w')) }.not_to raise_error
42
42
  end
43
43
 
44
- it "should know its output destination" do
44
+ it 'should know its output destination' do
45
45
  instance = Debug.new(destination)
46
46
  expect(instance.output).to eq(destination)
47
47
  end
@@ -54,7 +54,7 @@ describe Debug do
54
54
  instance = Debug.new(destination)
55
55
  a_visitor = empty_grammar.visitor
56
56
  instance.render(a_visitor)
57
- expectations =<<-SNIPPET
57
+ expectations = <<-SNIPPET
58
58
  before_grammar
59
59
  before_production
60
60
  before_rhs
@@ -69,7 +69,7 @@ SNIPPET
69
69
  instance = Debug.new(destination)
70
70
  a_visitor = sample_grammar.visitor
71
71
  instance.render(a_visitor)
72
- expectations =<<-SNIPPET
72
+ expectations = <<-SNIPPET
73
73
  before_grammar
74
74
  before_production
75
75
  before_rhs
@@ -111,4 +111,4 @@ end # describe
111
111
  end # module
112
112
  end # module
113
113
 
114
- # End of file
114
+ # End of file
@@ -95,4 +95,4 @@ end # describe
95
95
 
96
96
  end # module
97
97
 
98
- # End of file
98
+ # End of file
@@ -72,8 +72,8 @@ describe ProductionRef do
72
72
 
73
73
  it 'should complain when binding to something else than production' do
74
74
  subject.bind_to(target)
75
- msg = "Illegal production type String"
76
- expect {subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
75
+ msg = 'Illegal production type String'
76
+ expect { subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
77
77
  end
78
78
 
79
79
  it 'should compare to other production (reference)' do
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequitur
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.01
4
+ version: 0.1.02
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-09-17 00:00:00.000000000 Z
11
+ date: 2014-09-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -66,7 +66,11 @@ dependencies:
66
66
  - - ! '>='
67
67
  - !ruby/object:Gem::Version
68
68
  version: 2.0.0
69
- description: Ruby implementation of the Sequitur algorithm.
69
+ description: ! "Ruby implementation of the Sequitur algorithm. This algorithm automatically
70
+ \nfinds repetitions and hierarchical structures in a given sequence of input \ntokens.
71
+ It encodes the input into a context-free grammar. \nThe Sequitur algorithm can be
72
+ used to \na) compress a sequence of items,\nb) discover patterns in an sequence,
73
+ \nc) generate grammar rules that can represent a given input.\n"
70
74
  email: famished.tiger@yahoo.com
71
75
  executables: []
72
76
  extensions: []