sequitur 0.1.01 → 0.1.02

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- NTQ5ODkzODJlYjNmZDBiODNiZTdiZTE4ZDFlNTljYWFhMTg5MzExYw==
4
+ YTdkNzZiNTc1NjBkM2M0MDlhZDI1M2MyNTFhODJhZGI1MjFlYWI2MQ==
5
5
  data.tar.gz: !binary |-
6
- YzZkNWE1MTdhZTBiMWZmOTI1ZDhhMDJkM2QxYTU3ZDAxZDExZjk5MQ==
6
+ MmEzZGRlNTI2M2U3ZmQwYmY3MTA1MmE0MDkzMGQ0ZjBmZDJlYTRmMQ==
7
7
  !binary "U0hBNTEy":
8
8
  metadata.gz: !binary |-
9
- YWVmZTE0YWQ4MmEyMTI4YTFjYmU2MTZkYTZhOGUwZTM4YTZmMDQ2OTliMDky
10
- NjVmMjdkYmVkMGY3MGEzMDBlM2RjMzNlODMzZDQ3NGY4NzRmNzNkNjNlOTI2
11
- ZGFlMTkxMTNmZjA0MTI2MjM0MzhkMjM4MjUwMzg0ZDk5NDU0YTU=
9
+ MjUzMmU0YTQ4MzQ2NmVmMWU2YWQzMTkwZDNiZjM3MjgyOTFlMmRmZDJmMmJi
10
+ NmM5YjMxMjA0YzM5OGFiOGRiYjBmYTc2M2YyN2NiNjJiMGRlYjJkMmMxMThk
11
+ ZGU1MDlhYTBkZDc3YTEwMDAwNmQ0YTZlOTQyZGM5YmFmNTRjNmM=
12
12
  data.tar.gz: !binary |-
13
- MTNhOGQyOWNlNTQ4Y2RiOTcyYjU3ZDM3MGZmZGEzYjY2NjFkM2RiZmI3ZTZl
14
- ZmRiM2JiNzU0MGVkMDUzYTVjM2M4MmM3MWQwYTgyNmQzYjA0MTdhNWI4Mjg2
15
- ODAwYTc4M2JjZGQyOTUzOGM4MjA5MzZhYThlMmM5MWNjYjJhZmQ=
13
+ NGU2NjY2Yzc2ZmQ4NDFlN2E4MGVlYTUwMDg4NjgwYzBiYjk0ZjM5NGY4MTg4
14
+ NTI5NTExOTQzMWY1YzhiNWM4ZjM1OWQ5YjM1MjViZWVlYWRlMWU5NjcyNDNk
15
+ MzAwMzZhM2NlZGE1M2MzYTYyOGZmODkyMWE4YjA0NTE3MTk4NjA=
data/CHANGELOG.md CHANGED
@@ -1,6 +1,10 @@
1
+ ### 0.1.02 / 2014-09-18
2
+ * [CHANGE] File `README.md`: expanded introductory text.
3
+ * [CHANGE] File `sequitur.gemspec` : expanded gem description in the specification.
4
+
1
5
  ### 0.1.01 / 2014-09-17
2
6
  * [NEW] Added new `BaseFormatter` superclass. Sample formatters are inheriting from this one.
3
- * [CHANGE] File `README.me`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
7
+ * [CHANGE] File `README.md`: added a brief intro to the Sequitur algorithm, expanded the Ruby examples
4
8
  * [CHANGE] Private method `BaseText#prod_name` production name doesn't contain an underscore.
5
9
  * [CHANGE] Formatter class `BaseText` now inherits from `BaseFormatter`
6
10
  * [CHANGE] Formatter class `Debug` now inherits from `BaseFormatter`
data/README.md CHANGED
@@ -15,7 +15,7 @@ The following are good entry points to learn about the algorithm:
15
15
 
16
16
  ### The theory in a nutshell ###
17
17
  Given a sequence of input tokens (say, characters), the Sequitur algorithm
18
- will represent that input sequence as a set of rules. As the algorithm detects
18
+ will represent that input sequence as a set of rules. As the algorithm detects
19
19
  automatically repeated token patterns, the resulting rule set can encode repetitions in the input
20
20
  in a very compact way.
21
21
  Of interest is the fact that the algorithm runs in time linear in the length of the input sequence.
@@ -46,7 +46,7 @@ P3 : P2 d.
46
46
  ```
47
47
 
48
48
  Translated in plain English:
49
- - Rule (start) tells that the input consists of the sequence of P_1 P_2 P_3 patterns followed by the letter e.
49
+ - Rule (start) tells that the input consists of the sequence of P1 P2 P3 patterns followed by the letter e.
50
50
  - Rule (P1) represents the sequence 'ab'.
51
51
  - Rule (P2) represents the pattern encoded by P1 (thus 'ab') then 'c'.
52
52
  In other words, it represents the string 'abc'.
@@ -78,6 +78,7 @@ The following Ruby snippet show how to apply Sequitur on the input string from t
78
78
  The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
79
79
  can you make good use of the algorithm's result.
80
80
 
81
+ **Printing the resulting rules**
81
82
  The very first natural step is to be able to print out the (grammar) rules.
82
83
  Here's how:
83
84
 
@@ -106,6 +107,87 @@ Here's how:
106
107
  # P3 : P2 d.
107
108
  ```
108
109
 
110
+ ## Understanding the algorithm's results
111
+ The Sequitur algorithm generates a -simplified- context-free grammar, therefore we dedicate this section
112
+ to the terminology about context-free grammars. As the Internet provides tons of information can be found
113
+ on the subject, we limit ourselves to the minimal terminology of interest when using the sequitur gem.
114
+
115
+ First of all, what is a **grammar**? To simplify the matter, one can see a grammar as a set of
116
+ grammar rules. These rules are called production rules or more briefly **productions**.
117
+
118
+ In a context-free grammar, productions have the form:
119
+ ````
120
+ P : body.
121
+ ```
122
+
123
+ Where:
124
+ - The colon ':' character separates the head (= left-hand side) and the body (right-hand side, *rhs* in short)
125
+ of the rule.
126
+ - The left-hand side consists just of one symbol, P. P is a categorized as a *nonterminal symbol* and for our purposes
127
+ a nonterminal symbol can be seen as the "name" of the production. By contrast, a terminal symbol is just one element
128
+ from the input sequence (symbols as defined in formal grammar theory shouldn't be confused with Ruby's `Symbol` class).
129
+ - the body is a sequence -possibly empty- of *symbols* (terminal or nonterminal).
130
+
131
+ Basically, a production rule tells that P is equivalent to the sequence of symbols found in the
132
+ right-hand side of the production. A nonterminal symbol that appears in the rhs of a production can be
133
+ seen as a reference to the production with same name.
134
+
135
+
136
+ ## The Sequitur API
137
+
138
+ Recall the above example: a single call to the `Sequitur#build_from` factory method
139
+ suffices to construct a grammar object.
140
+
141
+ ```ruby
142
+ require 'sequitur'
143
+
144
+ input_sequence = 'ababcabcdabcde'
145
+ grammar = Sequitur.build_from(input_sequence)
146
+ ```
147
+
148
+ The return value `grammar` is a `Sequitur::SequiturGrammar` instance.
149
+
150
+ Unsurprisingly, the `Sequitur::SequiturGrammar` class defines an accessor method called 'productions'
151
+ that returns the productions of the grammar as an array of `Sequitur::Production` objects.
152
+
153
+ ```ruby
154
+ # Count the number of productions in the grammar
155
+ puts grammar.productions.size # => 4
156
+
157
+ # Retrieve all productions of the grammar
158
+ all_prods = grammar.productions
159
+
160
+ # Retrieve the start production
161
+ start_prod = grammar.production[0]
162
+ ```
163
+
164
+ Once we have a grip on a production, it is easy to access its right-hand side through the `Production#rhs` method.
165
+ It returns an array of symbols.
166
+
167
+ ```ruby
168
+ # ...Continuing the same example
169
+ # Retrieve the right-hand side of the production
170
+ prod_body = start_prod.rhs # Return an Array object
171
+ ```
172
+
173
+ The RHS of a production is a sequence (i.e. Array) of symbols.
174
+ How are the grammar symbols implemented?
175
+ -Terminal symbols are directly originating from the input sequence. They are inserted "as is" in the
176
+ RHS. For instance, if the input sequence consists of integer values (i.e. Finum instances), then they
177
+ will be inserted in the RHS of productions.
178
+ -Non-terminal symbols are implemented as `Sequitur::ProductionRef` objects.
179
+
180
+ A ProductionRef is reference to a Production object. The latter one can be accessed through the `ProductionRef#production` method.
181
+
182
+
183
+ ### Installation ###
184
+ The sequitur gem installation is fairly standard.
185
+ If your project has a `Gemfile` file, add `sequitur` to it. Otherwise, install the gem like this:
186
+
187
+ ```bash
188
+ $[sudo] gem install sequitur
189
+ ```
190
+
109
191
 
110
192
 
111
193
  ### TODO: Add more documentation ###
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Sequitur # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.1.01'
6
+ Version = '0.1.02'
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = 'Ruby implementation of the Sequitur algorithm'
@@ -17,17 +17,19 @@ module Sequitur
17
17
  # Given a grammar or a grammar visitor, perform the visit
18
18
  # and render the visit events in the output stream.
19
19
  def render(aGrmOrVisitor)
20
- aVisitor = if aGrmOrVisitor.kind_of?(GrammarVisitor)
21
- aGrmOrVisitor
20
+ if aGrmOrVisitor.kind_of?(GrammarVisitor)
21
+ a_visitor = aGrmOrVisitor
22
22
  else
23
- aGrmOrVisitor.visitor
23
+ a_visitor = aGrmOrVisitor.visitor
24
24
  end
25
25
 
26
- aVisitor.subscribe(self)
27
- aVisitor.start()
28
- aVisitor.unsubscribe(self)
26
+ a_visitor.subscribe(self)
27
+ a_visitor.start
28
+ a_visitor.unsubscribe(self)
29
29
  end
30
30
 
31
31
  end # class
32
32
  end # module
33
- end # module
33
+ end # module
34
+
35
+ # End of file
@@ -22,7 +22,7 @@ class GrammarVisitor
22
22
  end
23
23
 
24
24
  def unsubscribe(aSubscriber)
25
- subscribers.delete_if { |entry| entry == aSubscriber}
25
+ subscribers.delete_if { |entry| entry == aSubscriber }
26
26
  end
27
27
 
28
28
  # The signal to start the visit.
@@ -66,6 +66,7 @@ class GrammarVisitor
66
66
  end
67
67
 
68
68
  private
69
+
69
70
  def broadcast(msg, *args)
70
71
  subscribers.each do |a_subscriber|
71
72
  next unless a_subscriber.respond_to?(msg)
@@ -46,4 +46,3 @@ end # describe
46
46
  end # module
47
47
 
48
48
  # End of file
49
-
@@ -117,7 +117,7 @@ describe DynamicGrammar do
117
117
  a_visitor.subscribe(fake_formatter)
118
118
 
119
119
  expect(fake_formatter).to receive(:before_grammar).with(subject).ordered
120
- expect(fake_formatter).to receive(:before_production).with(subject.root).ordered
120
+ expect(fake_formatter).to receive(:before_production).with(subject.root)
121
121
  expect(fake_formatter).to receive(:before_rhs).with([]).ordered
122
122
  expect(fake_formatter).to receive(:after_rhs).with([]).ordered
123
123
  expect(fake_formatter).to receive(:after_production).with(subject.root)
@@ -41,7 +41,7 @@ describe BaseText do
41
41
  expect { BaseText.new(StringIO.new('', 'w')) }.not_to raise_error
42
42
  end
43
43
 
44
- it "should know its output destination" do
44
+ it 'should know its output destination' do
45
45
  instance = BaseText.new(destination)
46
46
  expect(instance.output).to eq(destination)
47
47
  end
@@ -54,7 +54,7 @@ describe BaseText do
54
54
  instance = BaseText.new(destination)
55
55
  a_visitor = empty_grammar.visitor
56
56
  instance.render(a_visitor)
57
- expectations =<<-SNIPPET
57
+ expectations = <<-SNIPPET
58
58
  start :.
59
59
  SNIPPET
60
60
  expect(destination.string).to eq(expectations)
@@ -64,7 +64,7 @@ SNIPPET
64
64
  instance = BaseText.new(destination)
65
65
  a_visitor = sample_grammar.visitor # Use visitor explicitly
66
66
  instance.render(a_visitor)
67
- expectations =<<-SNIPPET
67
+ expectations = <<-SNIPPET
68
68
  start :.
69
69
  P1 : a.
70
70
  P2 : b.
@@ -77,7 +77,7 @@ SNIPPET
77
77
  it 'should support visit events without an explicit visitor' do
78
78
  instance = BaseText.new(destination)
79
79
  instance.render(sample_grammar)
80
- expectations =<<-SNIPPET
80
+ expectations = <<-SNIPPET
81
81
  start :.
82
82
  P1 : a.
83
83
  P2 : b.
@@ -92,4 +92,4 @@ end # describe
92
92
  end # module
93
93
  end # module
94
94
 
95
- # End of file
95
+ # End of file
@@ -41,7 +41,7 @@ describe Debug do
41
41
  expect { Debug.new(StringIO.new('', 'w')) }.not_to raise_error
42
42
  end
43
43
 
44
- it "should know its output destination" do
44
+ it 'should know its output destination' do
45
45
  instance = Debug.new(destination)
46
46
  expect(instance.output).to eq(destination)
47
47
  end
@@ -54,7 +54,7 @@ describe Debug do
54
54
  instance = Debug.new(destination)
55
55
  a_visitor = empty_grammar.visitor
56
56
  instance.render(a_visitor)
57
- expectations =<<-SNIPPET
57
+ expectations = <<-SNIPPET
58
58
  before_grammar
59
59
  before_production
60
60
  before_rhs
@@ -69,7 +69,7 @@ SNIPPET
69
69
  instance = Debug.new(destination)
70
70
  a_visitor = sample_grammar.visitor
71
71
  instance.render(a_visitor)
72
- expectations =<<-SNIPPET
72
+ expectations = <<-SNIPPET
73
73
  before_grammar
74
74
  before_production
75
75
  before_rhs
@@ -111,4 +111,4 @@ end # describe
111
111
  end # module
112
112
  end # module
113
113
 
114
- # End of file
114
+ # End of file
@@ -95,4 +95,4 @@ end # describe
95
95
 
96
96
  end # module
97
97
 
98
- # End of file
98
+ # End of file
@@ -72,8 +72,8 @@ describe ProductionRef do
72
72
 
73
73
  it 'should complain when binding to something else than production' do
74
74
  subject.bind_to(target)
75
- msg = "Illegal production type String"
76
- expect {subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
75
+ msg = 'Illegal production type String'
76
+ expect { subject.bind_to('WRONG') }.to raise_error(StandardError, msg)
77
77
  end
78
78
 
79
79
  it 'should compare to other production (reference)' do
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequitur
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.01
4
+ version: 0.1.02
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-09-17 00:00:00.000000000 Z
11
+ date: 2014-09-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -66,7 +66,11 @@ dependencies:
66
66
  - - ! '>='
67
67
  - !ruby/object:Gem::Version
68
68
  version: 2.0.0
69
- description: Ruby implementation of the Sequitur algorithm.
69
+ description: ! "Ruby implementation of the Sequitur algorithm. This algorithm automatically
70
+ \nfinds repetitions and hierarchical structures in a given sequence of input \ntokens.
71
+ It encodes the input into a context-free grammar. \nThe Sequitur algorithm can be
72
+ used to \na) compress a sequence of items,\nb) discover patterns in an sequence,
73
+ \nc) generate grammar rules that can represent a given input.\n"
70
74
  email: famished.tiger@yahoo.com
71
75
  executables: []
72
76
  extensions: []