fa 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (7) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +145 -8
  3. data/Rakefile +6 -0
  4. data/fa.gemspec +1 -0
  5. data/lib/fa.rb +133 -7
  6. data/lib/fa/version.rb +1 -1
  7. metadata +15 -1
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 4b7eae1d9e0955a9ef22ede62cdf480030dbf27f
4
- data.tar.gz: 2cb62222893579c9b18c61a550d90722417583f6
3
+ metadata.gz: 5f916d972953f8fa49a4b31c3523e0a360424333
4
+ data.tar.gz: b751b82025a64eccf35c143f01b43ddc35105daa
5
5
  SHA512:
6
- metadata.gz: 6b4644001fd36dac35f6f0c033a510e8b14e448ae615ec0bb3d74094277c3ed025b95ae6226fca3687611f31631e644a1e95b925fe91d6e46ddde83e014ae9fb
7
- data.tar.gz: 0aa3a9e2338f14cb695d39a0873075c49a84f4adb05c6d749159c1869c679c8ffd83d5d457beb3f083061833cf887debda4cf7b8fbc8971c213ebcbca06d91da
6
+ metadata.gz: eaac55dbfc978116be6cddff230df876dd0263b9a795d47b2032d669707081f17a49c98feaabac21c6fd1a3bf8be2c509431c494e279d199d71311d98d95a54a
7
+ data.tar.gz: 11e957f7da8b07f41337c80e5dc6aeb2583d660010ae705b57d8a59eb70ea711868beb4d5f7b7ea7268275e404f3309fd03355159607c47e9122380454db8819
data/README.md CHANGED
@@ -1,8 +1,12 @@
1
1
  # Fa
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/fa`. To experiment with that code, run `bin/console` for an interactive prompt.
4
-
5
- TODO: Delete this and the text above, and describe your gem
3
+ This gem provides bindings to [libfa](http://augeas.net/libfa/index.html),
4
+ a library for doing algebra on [regular expressions](#syntax). If you've
5
+ ever asked yourself questions like "Are these two regular expressions
6
+ matching the same set of strings ?" or wanted to determine a regular
7
+ expression that matches all strings that match one regular expression, but
8
+ not a second one, this is the library that can answer these questions for
9
+ you.
6
10
 
7
11
  ## Installation
8
12
 
@@ -20,17 +24,150 @@ Or install it yourself as:
20
24
 
21
25
  $ gem install fa
22
26
 
27
+ For things to work out, you will also have to have `libfa` installed; the
28
+ library is distributed as part of [augeas](http://augeas.net/). On Red
29
+ Hat-derived distros like Fedora, CentOS, or RHEL, you will need to `yum
30
+ install augeas-libs`, on Debian-derived distros, run `apt-get install
31
+ libaugeas0`.
32
+
23
33
  ## Usage
24
34
 
25
- TODO: Write usage instructions here
35
+ To perform computations on regular expressions, `libfa` needs to first
36
+ convert your regular expression into a finite automaton. This is done by
37
+ compiling your regular expression:
38
+
39
+ ```ruby
40
+ fa1 = Fa.compile("(a|b)") # can also be written as Fa["(a|b)"]
41
+ fa2 = fa1.plus
42
+ ```
43
+
44
+ Notice that the regular expression needs to be given as a
45
+ string. Unfortunately, Ruby regular expressions allow constructs that go
46
+ beyond the mathematical notion of a regular expression and can therefore
47
+ not be used to do the kinds of computation that `libfa` performs. The
48
+ regular expressions that `libfa` deals in must be written using a (subset
49
+ of) the notation for
50
+ [extended POSIX regular expressions](https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions). The
51
+ biggest difference between POSIX ERE and the syntax that `libfa`
52
+ understands is that `libfa` does not allow backreferences, does not support
53
+ anchors like `^` and `$`, and does not support named character classes like
54
+ `[[:space:]]`.
55
+
56
+ You can always turn a finite automaton back into a regular expression using
57
+ `Fa#to_s`:
58
+
59
+ ```ruby
60
+
61
+ puts fa1
62
+ # "b|a"
63
+ puts fa1.minimize
64
+ # "[ab]"
65
+ puts fa1.union(fa2).minimize
66
+ # "[ab][ab]*"
67
+ puts fa1.concat(fa2).minimize
68
+ # "[ab][ab][ab]*"
69
+ puts fa2.intersect(fa1).minimize
70
+ # "b|a"
71
+ puts fa2.intersect(Fa["a*"]).minimize
72
+ # "aa*"
73
+ puts fa1.intersect(Fa["a*"]).minimize
74
+ # "a"
75
+ puts Fa["a+"].minus(Fa["a{2,}"])
76
+ # "a"
77
+ ```
78
+
79
+ You can also compare finite automata, and therefore learn things on how
80
+ they behave on _all_ strings, for example if they match the same exact set
81
+ of strings, or if one matches strictly more strings than another:
82
+
83
+ ```ruby
84
+ fa = Fa["[a-z]"].intersect(Fa["a*"])
85
+ puts Fa["a"].equals(fa)
86
+ # true
87
+ fa = Fa["a"].union(Fa["b"]).star.concat(Fa["c"].plus)
88
+ puts Fa["(a|b)*c+"].equals(fa)
89
+ # true
90
+ puts Fa["[ab]*"].contains(Fa["a*"])
91
+ # true
92
+ puts Fa["a+"].minus(Fa["a*"]).empty?
93
+ # true
94
+ ```
95
+
96
+ ### Syntax
97
+
98
+ The regular expressions that `libfa` understand are a subset of the POSIX
99
+ extended regular expression syntax. If you are a regular expression
100
+ aficionado, you should note that `libfa` does not support some popular
101
+ syntax extensions. Most importantly, it does not support backreferences,
102
+ anchors such as `^` and `$`, and named character classes such as
103
+ `[[:upper:]]`. The first two are not supported since they take the notation
104
+ out of the realm of finite automata and actual regular expressions. Named
105
+ character classes are not implemented because there's a lot of work to
106
+ support them, even though there is no objection from theory to them.
107
+
108
+ The character set that `libfa` operates on is simple 8 bit ASCII. In other
109
+ words, to `libfa`, a character is a byte, and it does not support larger
110
+ character sets such as UTF-8.
111
+
112
+ The following characters have special meaning for `libfa`. The symbols `R`,
113
+ `S`, `T`, etc. in the list below can be regular expressions themselves. The
114
+ list is ordered by increasing precendence of the operators.
115
+
116
+ * `R|S`: matches anything that matches either `R` or `S`
117
+ * `R*`: matches any number of `R`, including none
118
+ * `R+`: matches any number of `R`, but at least one
119
+ * `R?`: matches no or one occurence of `R`
120
+ * `R{n,m}`: matches at least `n` but no more than `m` occurences of
121
+ `R`. `n` must be nonnegative. If `m` is missing or equals `-1`, matches
122
+ an unlimited number of `R`.
123
+ * `(R)`: the parentheses are solely there for grouping, and this expression
124
+ matches the same strings as `R` alone
125
+ * `[C]`: matches the characters in the character set `C`; see below for the
126
+ syntax of character sets
127
+ * `.`: matches any single character except for newline
128
+ * `\c`: the literal character `c`, even if it would otherwise have special
129
+ meaning; the expression `\(` matches an opening parenthesis.
130
+
131
+ Character classes `[C]` use the following notation:
132
+
133
+ * `[^C]`: matches all characters not in `[C]`. `[^a-zA-Z]` matches
134
+ everything that is not a letter.
135
+ * `[a-z]`: matches all characters between `a` and `z`, inclusive. Multiple
136
+ ranges can be specified in the same character set. `[a-zA-Z0-9]` is a
137
+ perfectly valid character set.
138
+ * if a character set should include `]`, it must be listed as the first
139
+ character. `[][]` matches the opening and closing bracket.
140
+ * if a character set should include `-`, it must be listed as the last
141
+ character. `[-]` matches solely a dash.
142
+ * no characters in character sets are special, and there is no backslash
143
+ escaping of characters in character classes. `[.]` matches a literal dot.
144
+
145
+ The regular expression syntax has no notation for control characters: when
146
+ `libfa` sees `\n` in a string you are compiling, it will match that against
147
+ the character `n`, not a newline. That's not a problem as the strings you
148
+ write in Ruby code go through Ruby's backslash interpretation first. When
149
+ you write `Fa.compile("[\n]")`, `libfa` never sees the backslash as Ruby
150
+ replaces `\n` with a newline character before that string ever makes it to
151
+ `libfa`. That has the funny consequence that if you want to use a literal
152
+ backslash in your regular expression, your input string must have _four_
153
+ backslashes in it: when you write `Fa.compile("\\\\")`, Ruby first turns
154
+ that into a string with two backslashes, which `libfa` then interprets as a
155
+ single
156
+ backslash. \[_If you are reading this in YARD documentation and only saw two backslashes in the `Fa.compile`, it's because YARD reduced them from the markdown source. Github does not, and so this example will always be wrong in one of them._\]
26
157
 
27
158
  ## Development
28
159
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
160
+ After checking out the repo, run `bin/setup` to install dependencies. Then,
161
+ run `rake test` to run the tests. You can also run `bin/console` for an
162
+ interactive prompt that will allow you to experiment.
30
163
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
164
+ To install this gem onto your local machine, run `bundle exec rake
165
+ install`. To release a new version, update the version number in
166
+ `version.rb`, and then run `bundle exec rake release`, which will create a
167
+ git tag for the version, push git commits and tags, and push the `.gem`
168
+ file to [rubygems.org](https://rubygems.org).
32
169
 
33
170
  ## Contributing
34
171
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/fa.
36
-
172
+ Bug reports and pull requests are welcome on GitHub at
173
+ https://github.com/lutter/ruby-fa.
data/Rakefile CHANGED
@@ -1,5 +1,6 @@
1
1
  require "bundler/gem_tasks"
2
2
  require "rake/testtask"
3
+ require 'yard'
3
4
 
4
5
  Rake::TestTask.new(:test) do |t|
5
6
  t.libs << "test"
@@ -7,4 +8,9 @@ Rake::TestTask.new(:test) do |t|
7
8
  t.test_files = FileList['test/**/*_test.rb']
8
9
  end
9
10
 
11
+ YARD::Rake::YardocTask.new do |t|
12
+ #t.options = ['--any', '--extra', '--opts']
13
+ t.stats_options = ['--list-undoc']
14
+ end
15
+
10
16
  task :default => :test
data/fa.gemspec CHANGED
@@ -35,4 +35,5 @@ EOS
35
35
  spec.add_development_dependency "bundler", "~> 1.14"
36
36
  spec.add_development_dependency "rake", "~> 10.0"
37
37
  spec.add_development_dependency "minitest", "~> 5.0"
38
+ spec.add_development_dependency "yard"
38
39
  end
data/lib/fa.rb CHANGED
@@ -1,11 +1,18 @@
1
1
  require "fa/version"
2
2
  require "fa/ffi"
3
3
 
4
+ # Namespace for the libfa bindings
4
5
  module Fa
6
+ # A generic error encountered during an fa operation
5
7
  class Error < StandardError; end
6
8
 
9
+ # An operation in libfa failed because it could not allocate enough
10
+ # memory
7
11
  class OutOfMemoryError < Error; end
8
12
 
13
+ # The class representing a finite automaton. It contains a pointer to a
14
+ # +struct fa+ from +libfa+ and provides Ruby wrappers for the various
15
+ # +libfa+ operations.
9
16
  class Automaton < ::FFI::AutoPointer
10
17
  attr_reader :faptr
11
18
 
@@ -13,54 +20,130 @@ module Fa
13
20
  @faptr = faptr
14
21
  end
15
22
 
23
+ # Minimizes this automaton in place. Uses either Hopcroft's or
24
+ # Brzozowski's algorithm. Due to a stupid design mistake in +libfa+,
25
+ # the algorithm is selected through a global variable. It defaults to
26
+ # Hopcroft's algorithm though.
27
+ #
28
+ # @return [Fa::Automaton] this automaton
16
29
  def minimize
17
30
  r = FFI::minimize(faptr)
18
31
  raise Error if r < 0
19
32
  self
20
33
  end
21
34
 
35
+ # Concatenates +self+ with +other+, corresponding to +SO+. Neither
36
+ # +self+ nor +other+ will be modified.
37
+ #
38
+ # @param [Fa::Automaton] other
39
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
40
+ # @return [Fa::Automaton] the concatenation of +self+ and +other+
22
41
  def concat(other)
23
42
  from_ptr( FFI::concat(faptr, other.faptr) )
24
43
  end
25
44
 
45
+ # Produces the union of +self+ with +other+, corresponding to
46
+ # +S|O+. Neither +self+ nor +other+ will be modified.
47
+ #
48
+ # @param [Fa::Automaton] other
49
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
50
+ # @return [Fa::Automaton] the union of +self+ and +other+
26
51
  def union(other)
27
52
  from_ptr( FFI::union(faptr, other.faptr) )
28
53
  end
29
54
 
55
+ # Produces an iteration of +self+, corresponding to
56
+ # +S{min,max}+. +self+ will not be modified.
57
+ #
58
+ # @param [Int] min the minimum number of matches
59
+ # @param [Int] max the maximum number of matches, use -1 for an
60
+ # unlimited number of matches
61
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
62
+ # @return [Fa::Automaton] the iterated automaton
30
63
  def iter(min, max)
31
64
  from_ptr( FFI::iter(faptr, min, max) )
32
65
  end
33
66
 
67
+ # Produces an iteration of any number of +self+, corresponding to
68
+ # +S*+. +self+ will not be modified.
69
+ #
70
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
71
+ # @return [Fa::Automaton] the iterated automaton
34
72
  def star
35
73
  iter(0, -1)
36
74
  end
37
75
 
76
+ # Produces an iteration of at least one +self+, corresponding to
77
+ # +S\++. +self+ will not be modified.
78
+ #
79
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
80
+ # @return [Fa::Automaton] the iterated automaton
38
81
  def plus
39
82
  iter(1, -1)
40
83
  end
41
84
 
85
+ # Produces an iteration of zero or one +self+, corresponding to
86
+ # +S?+. +self+ will not be modified.
87
+ #
88
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
89
+ # @return [Fa::Automaton] the iterated automaton
42
90
  def maybe
43
91
  iter(0, 1)
44
92
  end
45
93
 
94
+ # Produces the intersection of +self+ and +other+. Neither +self+ nor
95
+ # +other+ will be modified. The resulting automaton will match all
96
+ # strings that simultaneously match +self+ and +other+,
97
+ #
98
+ # @param [Fa::Automaton] other
99
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
100
+ # @return [Fa::Automaton] the iterated automaton
46
101
  def intersect(other)
47
102
  from_ptr( FFI::intersect(faptr, other.faptr) )
48
103
  end
49
104
 
105
+ # Produces the difference of +self+ and +other+. Neither +self+ nor
106
+ # +other+ will be modified. The resulting automaton will match all
107
+ # strings that match +self+ but not +other+,
108
+ #
109
+ # @param [Fa::Automaton] other
110
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
111
+ # @return [Fa::Automaton] the iterated automaton
50
112
  def minus(other)
51
113
  from_ptr( FFI::minus(faptr, other.faptr) )
52
114
  end
53
115
 
116
+ # Produces the complement of +self+. +self+ will not be modified. The
117
+ # resulting automaton will match all strings that do _not_ match
118
+ # +self+.
119
+ #
120
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
121
+ # @return [Fa::Automaton] the iterated automaton
54
122
  def complement
55
123
  from_ptr( FFI::complement(faptr) )
56
124
  end
57
125
 
126
+ # Returns whether +self+ and +other+ match the same set of strings
127
+ #
128
+ # @param [Fa::Automaton] other
129
+ # @return [Boolean]
58
130
  def equals(other)
59
131
  r = FFI::equals(faptr, other.faptr)
60
132
  raise Error if r < 0
61
133
  return r == 1
62
134
  end
63
135
 
136
+ # Returns whether +self+ and +other+ match the same set of strings
137
+ #
138
+ # @param [Fa::Automaton] other
139
+ # @return [Boolean]
140
+ def ==(other); equals(other); end
141
+
142
+ # Returns whether +self+ matches all the strings that +other+
143
+ # matches. +self+ may match more strings than that.
144
+ #
145
+ # @param [Fa::Automaton] other
146
+ # @return [Boolean]
64
147
  def contains(other)
65
148
  # The C function works like fa1 <= fa2, and not how the
66
149
  # Ruby nomenclature would suggest it, so swap the arguments
@@ -69,20 +152,48 @@ module Fa
69
152
  return r == 1
70
153
  end
71
154
 
72
- def is_basic(basic)
155
+ # Returns whether +other+ matches all the strings that +self+
156
+ # matches. +other+ may match more strings than that.
157
+ #
158
+ # @param [Fa::Automaton] other
159
+ # @return [Boolean]
160
+ def <=(other); other.contains(self); end
161
+
162
+ # Returns whether +self+ is the empty, epsilon or total automaton
163
+ #
164
+ # @param [:empty, :epsilon, :total] kind
165
+ # @return [Boolean]
166
+ def is_basic(kind)
73
167
  # FFI::is_basic checks if the automaton is structurally the same as
74
168
  # +basic+; we just want to check here if they accept the same
75
169
  # language. If is_basic fails, we therefore check for equality
76
- r = FFI::is_basic(faptr, basic)
170
+ r = FFI::is_basic(faptr, kind)
77
171
  return true if r == 1
78
- return equals(Fa::make_basic(basic))
172
+ return equals(Fa::make_basic(kind))
79
173
  end
80
174
 
175
+ # Returns whether +self+ is the empty automaton, i.e., matches no words
176
+ # at all
177
+ # @return [Boolean]
81
178
  def empty?; is_basic(:empty); end
179
+
180
+ # Returns whether +self+ is the epsilon automaton, i.e., matches only
181
+ # the empty string
182
+ # @return [Boolean]
82
183
  def epsilon?; is_basic(:epsilon); end
184
+
185
+ # Returns whether +self+ is the total automaton, i.e., matches all
186
+ # possible words.
187
+ # @return [Boolean]
83
188
  def total?; is_basic(:total); end
84
189
 
85
- def as_regexp
190
+ # Return the representation of +self+ as a regular expression. Note
191
+ # that that regular expression can look pretty complicated, even for
192
+ # seemingly simple automata. Sometimes, minimizing the automaton before
193
+ # turning it into a string helps; sometimes it doesn't.
194
+ #
195
+ # @return [String] the regular expression for +self+
196
+ def to_s
86
197
  rx = ::FFI::MemoryPointer.new :string
87
198
  rx_len = ::FFI::MemoryPointer.new :size_t
88
199
  r = FFI::as_regexp(faptr, rx, rx_len)
@@ -97,21 +208,36 @@ module Fa
97
208
 
98
209
  :private
99
210
  def from_ptr(ptr)
100
- raise OutOfMemoryError if ptr.nil?
211
+ raise OutOfMemoryError if ptr.null?
101
212
  Automaton.new(ptr)
102
213
  end
103
214
  end
104
215
 
216
+ # Compiles +rx+ into a finite automaton
217
+ # @param [String] rx a regular expression
218
+ # @return [Fa::Automaton] the finite automaton
105
219
  def self.compile(rx)
106
220
  faptr = ::FFI::MemoryPointer.new :pointer
107
221
  r = FFI::compile(rx, rx.size, faptr)
108
- raise Error if r < 0
222
+ raise Error if r != 0 # REG_NOERROR is 0, at least for glibc
109
223
  Automaton.new(faptr.get_pointer(0))
110
224
  end
111
225
 
226
+ # Compiles +rx+ into a finite automaton
227
+ # @param [String] rx a regular expression
228
+ # @return [Fa::Automaton] the finite automaton
229
+ def self.[](rx)
230
+ compile(rx)
231
+ end
232
+
233
+ # Makes a basic finite automaton, either an empty, epsilon, or total
234
+ # finite automaton. Those match no words, only the empty word, or all
235
+ # words.
236
+ # @param [:empty, :epsilon, :total] kind
237
+ # @return [Fa::Automaton] the finite automaton
112
238
  def self.make_basic(kind)
113
239
  faptr = FFI::make_basic(kind)
114
- raise OutOfMemoryError if faptr.nil?
240
+ raise OutOfMemoryError if faptr.null?
115
241
  Automaton.new(faptr)
116
242
  end
117
243
  end
@@ -1,3 +1,3 @@
1
1
  module Fa
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fa
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - 'David Lutterkort
@@ -68,6 +68,20 @@ dependencies:
68
68
  - - "~>"
69
69
  - !ruby/object:Gem::Version
70
70
  version: '5.0'
71
+ - !ruby/object:Gem::Dependency
72
+ name: yard
73
+ requirement: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ type: :development
79
+ prerelease: false
80
+ version_requirements: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
71
85
  description: |
72
86
  Bindings for libfa, a library to manipulate finite automata. Automata are
73
87
  constructed from regular expressions, using extended POSIX syntax, and make