sequitur 0.1.26 → 0.1.27
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.yardopts +2 -2
- data/CHANGELOG.md +10 -0
- data/Gemfile +1 -6
- data/README.md +68 -67
- data/lib/sequitur/constants.rb +1 -1
- metadata +12 -12
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c4f8d643f0d20e27fa1f36c2334d4fdb1ff749710aa382d8a0b9553ef2180191
|
4
|
+
data.tar.gz: a0db266b1a501f9c7b386a23eb99b60231a37d564eefc115a86df877469a5ef3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c0b09c07ccbd6b60011d5bd0e38e2031e3b78e017181c8fac9918a16f921d3bd52f674635cdfe6ec54bacca0a7e5443b378b3f4f3a6aea65bcef488006983b1a
|
7
|
+
data.tar.gz: 687acbb729d94875e77a05420f1d9822e4bfa7ae86d49dc82a7422e51c7086d6222b906b13518e2f76b1bb78b24245ee6f580e9bce212c2916432b07813aeab7
|
data/.yardopts
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,13 @@
|
|
1
|
+
## [0.1.27] - 2025-02-16
|
2
|
+
Credits to `gemmaro` for the improvements:
|
3
|
+
- Enhancements to document generation with YARD.
|
4
|
+
- Updates to the `gemspec` and `Gemfile`.
|
5
|
+
- Improved `README.md` markup for better rendering with YARD and on the GitHub repository page.
|
6
|
+
|
7
|
+
## [0.1.26] - 2025-02-15
|
8
|
+
- Tested against MRI Ruby 3.4.1
|
9
|
+
- Version of dependencies updated
|
10
|
+
|
1
11
|
## [0.1.25] - 2023-07-10
|
2
12
|
- First version using RBS (Ruby Type System): new folder `sig` with `*.rbs` files
|
3
13
|
- Yard documentation updated to align with RBS
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -11,14 +11,15 @@ _Ruby gem implementing the Sequitur algorithm_
|
|
11
11
|
[](https://github.com/famished-tiger/Sequitur/blob/master/LICENSE.txt)
|
12
12
|
|
13
13
|
### What is the Sequitur algorithm? ###
|
14
|
-
The following are good entry points to learn about the algorithm:
|
15
|
-
|
16
|
-
[
|
14
|
+
The following are good entry points to learn about the algorithm:
|
15
|
+
|
16
|
+
* [Sequitur algorithm home](http://sequitur.info/)
|
17
|
+
* [Wikipedia](http://en.wikipedia.org/wiki/Sequitur_algorithm)
|
17
18
|
|
18
19
|
### Highlights ###
|
19
20
|
* Pure Ruby implementation
|
20
21
|
* No runtime dependency with other gems,
|
21
|
-
* Test suite with 100
|
22
|
+
* Test suite with 100% coverage,
|
22
23
|
* Documentation: 100% coverage (according to YARD), green badge from inch.io
|
23
24
|
* Algorithm works with different input token types (no limited to single character)
|
24
25
|
|
@@ -35,31 +36,33 @@ Notice that it is the same as the text 'abc' repeated three times. The Sequitur
|
|
35
36
|
this repetition and will generate the two following rules:
|
36
37
|
|
37
38
|
```
|
38
|
-
start : P1 P1 P1.
|
39
|
+
start : P1 P1 P1.
|
39
40
|
P1 : a b c.
|
40
41
|
```
|
41
42
|
|
42
43
|
In plain English:
|
43
|
-
-The first rule (named start) always represents the whole input. Here, it indicates that the input
|
44
|
-
is three time the pattern encoded by the rule called P1
|
45
|
-
-The second rule (named P1) represents the sequence a b c
|
44
|
+
-The first rule (named `start`) always represents the whole input. Here, it indicates that the input
|
45
|
+
is three time the pattern encoded by the rule called `P1`.
|
46
|
+
-The second rule (named `P1`) represents the sequence `a b c`.
|
46
47
|
|
47
48
|
**Can you give another example?**
|
48
49
|
Yep. Assume this time that the input is *'ababcabcdabcde'*.
|
49
50
|
Then Sequitur algorithm will generate the rule set:
|
50
51
|
```
|
51
|
-
start : P1 P2 P3 P3 e.
|
52
|
-
P1 : a b.
|
53
|
-
P2 : P1 c.
|
54
|
-
P3 : P2 d.
|
52
|
+
start : P1 P2 P3 P3 e.
|
53
|
+
P1 : a b.
|
54
|
+
P2 : P1 c.
|
55
|
+
P3 : P2 d.
|
55
56
|
```
|
56
57
|
|
57
|
-
Translated in plain English:
|
58
|
-
|
59
|
-
- Rule (
|
60
|
-
|
61
|
-
|
62
|
-
- Rule (
|
58
|
+
Translated in plain English:
|
59
|
+
|
60
|
+
- Rule (`start`) tells that the input consists of the sequence of `P1 P2
|
61
|
+
P3` patterns followed by the letter `e`.
|
62
|
+
- Rule (`P1`) represents the sequence `ab`.
|
63
|
+
- Rule (`P2`) represents the pattern encoded by `P1` (thus `ab`) then `c`. In
|
64
|
+
other words, it represents the string `abc`.
|
65
|
+
- Rule (`P3`) represents the pattern encoded by `P2` then `d`. It is thus equivalent to `abcd`.
|
63
66
|
|
64
67
|
**What is it used for?**
|
65
68
|
Sequitur can be used:
|
@@ -75,13 +78,12 @@ repeated elements)
|
|
75
78
|
The following Ruby snippet show how to apply Sequitur on the input string from the last example above.
|
76
79
|
|
77
80
|
```ruby
|
81
|
+
require 'sequitur' # Load the Sequitur library
|
78
82
|
|
79
|
-
|
80
|
-
|
81
|
-
input_sequence = 'ababcabcdabcde' # Let's analyze this string
|
83
|
+
input_sequence = 'ababcabcdabcde' # Let's analyze this string
|
82
84
|
|
83
|
-
|
84
|
-
|
85
|
+
# Run the Sequitur algorithm which will result in a grammar (=rule set)
|
86
|
+
grammar = Sequitur.build_from(input_sequence)
|
85
87
|
````
|
86
88
|
|
87
89
|
The demo illustrates how easy it is to run the algorithm on a string. However, the next question is how
|
@@ -93,27 +95,27 @@ Here's how:
|
|
93
95
|
|
94
96
|
|
95
97
|
```ruby
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
98
|
+
require 'sequitur'
|
99
|
+
input_sequence = 'ababcabcdabcde'
|
100
|
+
grammar = Sequitur.build_from(input_sequence)
|
101
|
+
|
102
|
+
# To display the grammar rules on the console output
|
103
|
+
# We use a grammar formatter
|
104
|
+
formatter = Sequitur::Formatter::BaseText.new(STDOUT)
|
105
|
+
|
106
|
+
# Now render the rules. Each rule is displayed with the format:
|
107
|
+
# rule_id : a_sequence_grammar_symbols.
|
108
|
+
# Where:
|
109
|
+
# - rule_id is either 'start' or a name like 'Pxxxx' (xxxx is a sequential number)
|
110
|
+
# - a grammar symbol is either a terminal symbol
|
111
|
+
# (i.e. a character from the input) or a rule id
|
112
|
+
formatter.render(grammar.visitor)
|
113
|
+
|
114
|
+
# Rendered output is:
|
115
|
+
# start : P1 P2 P3 P3 e.
|
116
|
+
# P1 : a b.
|
117
|
+
# P2 : P1 c.
|
118
|
+
# P3 : P2 d.
|
117
119
|
```
|
118
120
|
|
119
121
|
## Understanding the algorithm's results
|
@@ -125,12 +127,12 @@ First of all, what is a **grammar**? To simplify the matter, one can see a gramm
|
|
125
127
|
grammar rules. These rules are called production rules or more briefly **productions**.
|
126
128
|
|
127
129
|
In a context-free grammar, productions have the form:
|
128
|
-
|
130
|
+
```
|
129
131
|
P : body.
|
130
132
|
```
|
131
133
|
|
132
134
|
Where:
|
133
|
-
- The colon ':' character separates the head (= left-hand side) and the body (right-hand side, *rhs* in short)
|
135
|
+
- The colon `':'` character separates the head (= left-hand side) and the body (right-hand side, *rhs* in short)
|
134
136
|
of the rule.
|
135
137
|
- The left-hand side consists just of one symbol, P. P is a categorized as a *nonterminal symbol* and for our purposes
|
136
138
|
a nonterminal symbol can be seen as the "name" of the production. By contrast, a terminal symbol is just one element
|
@@ -148,10 +150,10 @@ Recall the above example: a single call to the `Sequitur#build_from` factory met
|
|
148
150
|
suffices to construct a grammar object.
|
149
151
|
|
150
152
|
```ruby
|
151
|
-
|
153
|
+
require 'sequitur'
|
152
154
|
|
153
|
-
|
154
|
-
|
155
|
+
input_sequence = 'ababcabcdabcde'
|
156
|
+
grammar = Sequitur.build_from(input_sequence)
|
155
157
|
```
|
156
158
|
|
157
159
|
The return value `grammar` is a `Sequitur::SequiturGrammar` instance.
|
@@ -160,45 +162,44 @@ Unsurprisingly, the `Sequitur::SequiturGrammar` class defines an accessor method
|
|
160
162
|
that returns the productions of the grammar as an array of `Sequitur::Production` objects.
|
161
163
|
|
162
164
|
```ruby
|
163
|
-
|
164
|
-
|
165
|
+
# Count the number of productions in the grammar
|
166
|
+
puts grammar.productions.size # => 4
|
165
167
|
|
166
|
-
|
167
|
-
|
168
|
+
# Retrieve all productions of the grammar
|
169
|
+
all_prods = grammar.productions
|
168
170
|
|
169
|
-
|
170
|
-
|
171
|
+
# Retrieve the start production
|
172
|
+
start_prod = grammar.production[0]
|
171
173
|
```
|
172
174
|
|
173
175
|
Once we have a grip on a production, it is easy to access its right-hand side through the `Production#rhs` method.
|
174
176
|
It returns an array of symbols.
|
175
177
|
|
176
178
|
```ruby
|
177
|
-
|
178
|
-
|
179
|
-
|
179
|
+
# ...Continuing the same example
|
180
|
+
# Retrieve the right-hand side of the production
|
181
|
+
prod_body = start_prod.rhs # Return an Array object
|
180
182
|
```
|
181
183
|
|
182
184
|
The RHS of a production is a sequence (i.e. Array) of symbols.
|
183
185
|
How are the grammar symbols implemented?
|
184
|
-
-Terminal symbols are directly originating from the input sequence. They are
|
185
|
-
RHS. For instance, if the input sequence consists
|
186
|
-
will be inserted in
|
187
|
-
|
186
|
+
- Terminal symbols are directly originating from the input sequence. They are
|
187
|
+
inserted "as is" in the RHS. For instance, if the input sequence consists
|
188
|
+
of integer values (i.e. Integer instances), then they will be inserted in
|
189
|
+
the RHS of productions.
|
190
|
+
- Non-terminal symbols are implemented as `Sequitur::ProductionRef` objects.
|
188
191
|
|
189
|
-
A ProductionRef is reference to a Production object. The latter one can be accessed through the `ProductionRef#production` method.
|
192
|
+
A ProductionRef is a reference to a Production object. The latter one can be accessed through the `ProductionRef#production` method.
|
190
193
|
|
191
194
|
|
192
195
|
### Installation ###
|
193
196
|
The sequitur gem installation is fairly standard.
|
194
197
|
If your project has a `Gemfile` file, add `sequitur` to it. Otherwise, install the gem like this:
|
195
198
|
|
196
|
-
```
|
197
|
-
|
199
|
+
```sh
|
200
|
+
gem install sequitur
|
198
201
|
```
|
199
202
|
|
200
|
-
|
201
|
-
|
202
203
|
### Good to know ###
|
203
204
|
The above examples might give the impression that the input stream must consist of single
|
204
205
|
character tokens. This is simply not true.
|
data/lib/sequitur/constants.rb
CHANGED
@@ -8,7 +8,7 @@ module Sequitur
|
|
8
8
|
# rubocop:disable Naming/ConstantName
|
9
9
|
|
10
10
|
# @return [String] The version number of the gem.
|
11
|
-
Version = '0.1.
|
11
|
+
Version = '0.1.27'
|
12
12
|
|
13
13
|
# @return [String] Brief description of the gem.
|
14
14
|
Description = 'Ruby implementation of the Sequitur algorithm'
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequitur
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.27
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
bindir: bin
|
9
9
|
cert_chain: []
|
10
|
-
date: 2025-02-
|
10
|
+
date: 2025-02-16 00:00:00.000000000 Z
|
11
11
|
dependencies:
|
12
12
|
- !ruby/object:Gem::Dependency
|
13
13
|
name: rake
|
@@ -50,45 +50,45 @@ dependencies:
|
|
50
50
|
- !ruby/object:Gem::Version
|
51
51
|
version: 3.10.0
|
52
52
|
- !ruby/object:Gem::Dependency
|
53
|
-
name:
|
53
|
+
name: engtagger
|
54
54
|
requirement: !ruby/object:Gem::Requirement
|
55
55
|
requirements:
|
56
56
|
- - "~>"
|
57
57
|
- !ruby/object:Gem::Version
|
58
|
-
version:
|
58
|
+
version: 0.4.0
|
59
59
|
- - ">="
|
60
60
|
- !ruby/object:Gem::Version
|
61
|
-
version:
|
61
|
+
version: 0.4.0
|
62
62
|
type: :development
|
63
63
|
prerelease: false
|
64
64
|
version_requirements: !ruby/object:Gem::Requirement
|
65
65
|
requirements:
|
66
66
|
- - "~>"
|
67
67
|
- !ruby/object:Gem::Version
|
68
|
-
version:
|
68
|
+
version: 0.4.0
|
69
69
|
- - ">="
|
70
70
|
- !ruby/object:Gem::Version
|
71
|
-
version:
|
71
|
+
version: 0.4.0
|
72
72
|
- !ruby/object:Gem::Dependency
|
73
|
-
name:
|
73
|
+
name: yard
|
74
74
|
requirement: !ruby/object:Gem::Requirement
|
75
75
|
requirements:
|
76
76
|
- - "~>"
|
77
77
|
- !ruby/object:Gem::Version
|
78
|
-
version: 0.
|
78
|
+
version: 0.9.3
|
79
79
|
- - ">="
|
80
80
|
- !ruby/object:Gem::Version
|
81
|
-
version: 0.
|
81
|
+
version: 0.9.3
|
82
82
|
type: :development
|
83
83
|
prerelease: false
|
84
84
|
version_requirements: !ruby/object:Gem::Requirement
|
85
85
|
requirements:
|
86
86
|
- - "~>"
|
87
87
|
- !ruby/object:Gem::Version
|
88
|
-
version: 0.
|
88
|
+
version: 0.9.3
|
89
89
|
- - ">="
|
90
90
|
- !ruby/object:Gem::Version
|
91
|
-
version: 0.
|
91
|
+
version: 0.9.3
|
92
92
|
description: |
|
93
93
|
Ruby implementation of the Sequitur algorithm. This algorithm automatically
|
94
94
|
finds repetitions and hierarchical structures in a given sequence of input
|