fa 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (7) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +145 -8
  3. data/Rakefile +6 -0
  4. data/fa.gemspec +1 -0
  5. data/lib/fa.rb +133 -7
  6. data/lib/fa/version.rb +1 -1
  7. metadata +15 -1
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 4b7eae1d9e0955a9ef22ede62cdf480030dbf27f
4
- data.tar.gz: 2cb62222893579c9b18c61a550d90722417583f6
3
+ metadata.gz: 5f916d972953f8fa49a4b31c3523e0a360424333
4
+ data.tar.gz: b751b82025a64eccf35c143f01b43ddc35105daa
5
5
  SHA512:
6
- metadata.gz: 6b4644001fd36dac35f6f0c033a510e8b14e448ae615ec0bb3d74094277c3ed025b95ae6226fca3687611f31631e644a1e95b925fe91d6e46ddde83e014ae9fb
7
- data.tar.gz: 0aa3a9e2338f14cb695d39a0873075c49a84f4adb05c6d749159c1869c679c8ffd83d5d457beb3f083061833cf887debda4cf7b8fbc8971c213ebcbca06d91da
6
+ metadata.gz: eaac55dbfc978116be6cddff230df876dd0263b9a795d47b2032d669707081f17a49c98feaabac21c6fd1a3bf8be2c509431c494e279d199d71311d98d95a54a
7
+ data.tar.gz: 11e957f7da8b07f41337c80e5dc6aeb2583d660010ae705b57d8a59eb70ea711868beb4d5f7b7ea7268275e404f3309fd03355159607c47e9122380454db8819
data/README.md CHANGED
@@ -1,8 +1,12 @@
1
1
  # Fa
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/fa`. To experiment with that code, run `bin/console` for an interactive prompt.
4
-
5
- TODO: Delete this and the text above, and describe your gem
3
+ This gem provides bindings to [libfa](http://augeas.net/libfa/index.html),
4
+ a library for doing algebra on [regular expressions](#syntax). If you've
5
+ ever asked yourself questions like "Are these two regular expressions
6
+ matching the same set of strings ?" or wanted to determine a regular
7
+ expression that matches all strings that match one regular expression, but
8
+ not a second one, this is the library that can answer these questions for
9
+ you.
6
10
 
7
11
  ## Installation
8
12
 
@@ -20,17 +24,150 @@ Or install it yourself as:
20
24
 
21
25
  $ gem install fa
22
26
 
27
+ For things to work out, you will also have to have `libfa` installed; the
28
+ library is distributed as part of [augeas](http://augeas.net/). On Red
29
+ Hat-derived distros like Fedora, CentOS, or RHEL, you will need to `yum
30
+ install augeas-libs`, on Debian-derived distros, run `apt-get install
31
+ libaugeas0`.
32
+
23
33
  ## Usage
24
34
 
25
- TODO: Write usage instructions here
35
+ To perform computations on regular expressions, `libfa` needs to first
36
+ convert your regular expression into a finite automaton. This is done by
37
+ compiling your regular expression:
38
+
39
+ ```ruby
40
+ fa1 = Fa.compile("(a|b)") # can also be written as Fa["(a|b)"]
41
+ fa2 = fa1.plus
42
+ ```
43
+
44
+ Notice that the regular expression needs to be given as a
45
+ string. Unfortunately, Ruby regular expressions allow constructs that go
46
+ beyond the mathematical notion of a regular expression and can therefore
47
+ not be used to do the kinds of computation that `libfa` performs. The
48
+ regular expressions that `libfa` deals in must be written using a (subset
49
+ of) the notation for
50
+ [extended POSIX regular expressions](https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions). The
51
+ biggest difference between POSIX ERE and the syntax that `libfa`
52
+ understands is that `libfa` does not allow backreferences, does not support
53
+ anchors like `^` and `$`, and does not support named character classes like
54
+ `[[:space:]]`.
55
+
56
+ You can always turn a finite automaton back into a regular expression using
57
+ `Fa#to_s`:
58
+
59
+ ```ruby
60
+
61
+ puts fa1
62
+ # "b|a"
63
+ puts fa1.minimize
64
+ # "[ab]"
65
+ puts fa1.union(fa2).minimize
66
+ # "[ab][ab]*"
67
+ puts fa1.concat(fa2).minimize
68
+ # "[ab][ab][ab]*"
69
+ puts fa2.intersect(fa1).minimize
70
+ # "b|a"
71
+ puts fa2.intersect(Fa["a*"]).minimize
72
+ # "aa*"
73
+ puts fa1.intersect(Fa["a*"]).minimize
74
+ # "a"
75
+ puts Fa["a+"].minus(Fa["a{2,}"])
76
+ # "a"
77
+ ```
78
+
79
+ You can also compare finite automata, and therefore learn things on how
80
+ they behave on _all_ strings, for example if they match the same exact set
81
+ of strings, or if one matches strictly more strings than another:
82
+
83
+ ```ruby
84
+ fa = Fa["[a-z]"].intersect(Fa["a*"])
85
+ puts Fa["a"].equals(fa)
86
+ # true
87
+ fa = Fa["a"].union(Fa["b"]).star.concat(Fa["c"].plus)
88
+ puts Fa["(a|b)*c+"].equals(fa)
89
+ # true
90
+ puts Fa["[ab]*"].contains(Fa["a*"])
91
+ # true
92
+ puts Fa["a+"].minus(Fa["a*"]).empty?
93
+ # true
94
+ ```
95
+
96
+ ### Syntax
97
+
98
+ The regular expressions that `libfa` understand are a subset of the POSIX
99
+ extended regular expression syntax. If you are a regular expression
100
+ aficionado, you should note that `libfa` does not support some popular
101
+ syntax extensions. Most importantly, it does not support backreferences,
102
+ anchors such as `^` and `$`, and named character classes such as
103
+ `[[:upper:]]`. The first two are not supported since they take the notation
104
+ out of the realm of finite automata and actual regular expressions. Named
105
+ character classes are not implemented because there's a lot of work to
106
+ support them, even though there is no objection from theory to them.
107
+
108
+ The character set that `libfa` operates on is simple 8 bit ASCII. In other
109
+ words, to `libfa`, a character is a byte, and it does not support larger
110
+ character sets such as UTF-8.
111
+
112
+ The following characters have special meaning for `libfa`. The symbols `R`,
113
+ `S`, `T`, etc. in the list below can be regular expressions themselves. The
114
+ list is ordered by increasing precendence of the operators.
115
+
116
+ * `R|S`: matches anything that matches either `R` or `S`
117
+ * `R*`: matches any number of `R`, including none
118
+ * `R+`: matches any number of `R`, but at least one
119
+ * `R?`: matches no or one occurence of `R`
120
+ * `R{n,m}`: matches at least `n` but no more than `m` occurences of
121
+ `R`. `n` must be nonnegative. If `m` is missing or equals `-1`, matches
122
+ an unlimited number of `R`.
123
+ * `(R)`: the parentheses are solely there for grouping, and this expression
124
+ matches the same strings as `R` alone
125
+ * `[C]`: matches the characters in the character set `C`; see below for the
126
+ syntax of character sets
127
+ * `.`: matches any single character except for newline
128
+ * `\c`: the literal character `c`, even if it would otherwise have special
129
+ meaning; the expression `\(` matches an opening parenthesis.
130
+
131
+ Character classes `[C]` use the following notation:
132
+
133
+ * `[^C]`: matches all characters not in `[C]`. `[^a-zA-Z]` matches
134
+ everything that is not a letter.
135
+ * `[a-z]`: matches all characters between `a` and `z`, inclusive. Multiple
136
+ ranges can be specified in the same character set. `[a-zA-Z0-9]` is a
137
+ perfectly valid character set.
138
+ * if a character set should include `]`, it must be listed as the first
139
+ character. `[][]` matches the opening and closing bracket.
140
+ * if a character set should include `-`, it must be listed as the last
141
+ character. `[-]` matches solely a dash.
142
+ * no characters in character sets are special, and there is no backslash
143
+ escaping of characters in character classes. `[.]` matches a literal dot.
144
+
145
+ The regular expression syntax has no notation for control characters: when
146
+ `libfa` sees `\n` in a string you are compiling, it will match that against
147
+ the character `n`, not a newline. That's not a problem as the strings you
148
+ write in Ruby code go through Ruby's backslash interpretation first. When
149
+ you write `Fa.compile("[\n]")`, `libfa` never sees the backslash as Ruby
150
+ replaces `\n` with a newline character before that string ever makes it to
151
+ `libfa`. That has the funny consequence that if you want to use a literal
152
+ backslash in your regular expression, your input string must have _four_
153
+ backslashes in it: when you write `Fa.compile("\\\\")`, Ruby first turns
154
+ that into a string with two backslashes, which `libfa` then interprets as a
155
+ single
156
+ backslash. \[_If you are reading this in YARD documentation and only saw two backslashes in the `Fa.compile`, it's because YARD reduced them from the markdown source. Github does not, and so this example will always be wrong in one of them._\]
26
157
 
27
158
  ## Development
28
159
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
160
+ After checking out the repo, run `bin/setup` to install dependencies. Then,
161
+ run `rake test` to run the tests. You can also run `bin/console` for an
162
+ interactive prompt that will allow you to experiment.
30
163
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
164
+ To install this gem onto your local machine, run `bundle exec rake
165
+ install`. To release a new version, update the version number in
166
+ `version.rb`, and then run `bundle exec rake release`, which will create a
167
+ git tag for the version, push git commits and tags, and push the `.gem`
168
+ file to [rubygems.org](https://rubygems.org).
32
169
 
33
170
  ## Contributing
34
171
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/fa.
36
-
172
+ Bug reports and pull requests are welcome on GitHub at
173
+ https://github.com/lutter/ruby-fa.
data/Rakefile CHANGED
@@ -1,5 +1,6 @@
1
1
  require "bundler/gem_tasks"
2
2
  require "rake/testtask"
3
+ require 'yard'
3
4
 
4
5
  Rake::TestTask.new(:test) do |t|
5
6
  t.libs << "test"
@@ -7,4 +8,9 @@ Rake::TestTask.new(:test) do |t|
7
8
  t.test_files = FileList['test/**/*_test.rb']
8
9
  end
9
10
 
11
+ YARD::Rake::YardocTask.new do |t|
12
+ #t.options = ['--any', '--extra', '--opts']
13
+ t.stats_options = ['--list-undoc']
14
+ end
15
+
10
16
  task :default => :test
data/fa.gemspec CHANGED
@@ -35,4 +35,5 @@ EOS
35
35
  spec.add_development_dependency "bundler", "~> 1.14"
36
36
  spec.add_development_dependency "rake", "~> 10.0"
37
37
  spec.add_development_dependency "minitest", "~> 5.0"
38
+ spec.add_development_dependency "yard"
38
39
  end
data/lib/fa.rb CHANGED
@@ -1,11 +1,18 @@
1
1
  require "fa/version"
2
2
  require "fa/ffi"
3
3
 
4
+ # Namespace for the libfa bindings
4
5
  module Fa
6
+ # A generic error encountered during an fa operation
5
7
  class Error < StandardError; end
6
8
 
9
+ # An operation in libfa failed because it could not allocate enough
10
+ # memory
7
11
  class OutOfMemoryError < Error; end
8
12
 
13
+ # The class representing a finite automaton. It contains a pointer to a
14
+ # +struct fa+ from +libfa+ and provides Ruby wrappers for the various
15
+ # +libfa+ operations.
9
16
  class Automaton < ::FFI::AutoPointer
10
17
  attr_reader :faptr
11
18
 
@@ -13,54 +20,130 @@ module Fa
13
20
  @faptr = faptr
14
21
  end
15
22
 
23
+ # Minimizes this automaton in place. Uses either Hopcroft's or
24
+ # Brzozowski's algorithm. Due to a stupid design mistake in +libfa+,
25
+ # the algorithm is selected through a global variable. It defaults to
26
+ # Hopcroft's algorithm though.
27
+ #
28
+ # @return [Fa::Automaton] this automaton
16
29
  def minimize
17
30
  r = FFI::minimize(faptr)
18
31
  raise Error if r < 0
19
32
  self
20
33
  end
21
34
 
35
+ # Concatenates +self+ with +other+, corresponding to +SO+. Neither
36
+ # +self+ nor +other+ will be modified.
37
+ #
38
+ # @param [Fa::Automaton] other
39
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
40
+ # @return [Fa::Automaton] the concatenation of +self+ and +other+
22
41
  def concat(other)
23
42
  from_ptr( FFI::concat(faptr, other.faptr) )
24
43
  end
25
44
 
45
+ # Produces the union of +self+ with +other+, corresponding to
46
+ # +S|O+. Neither +self+ nor +other+ will be modified.
47
+ #
48
+ # @param [Fa::Automaton] other
49
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
50
+ # @return [Fa::Automaton] the union of +self+ and +other+
26
51
  def union(other)
27
52
  from_ptr( FFI::union(faptr, other.faptr) )
28
53
  end
29
54
 
55
+ # Produces an iteration of +self+, corresponding to
56
+ # +S{min,max}+. +self+ will not be modified.
57
+ #
58
+ # @param [Int] min the minimum number of matches
59
+ # @param [Int] max the maximum number of matches, use -1 for an
60
+ # unlimited number of matches
61
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
62
+ # @return [Fa::Automaton] the iterated automaton
30
63
  def iter(min, max)
31
64
  from_ptr( FFI::iter(faptr, min, max) )
32
65
  end
33
66
 
67
+ # Produces an iteration of any number of +self+, corresponding to
68
+ # +S*+. +self+ will not be modified.
69
+ #
70
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
71
+ # @return [Fa::Automaton] the iterated automaton
34
72
  def star
35
73
  iter(0, -1)
36
74
  end
37
75
 
76
+ # Produces an iteration of at least one +self+, corresponding to
77
+ # +S\++. +self+ will not be modified.
78
+ #
79
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
80
+ # @return [Fa::Automaton] the iterated automaton
38
81
  def plus
39
82
  iter(1, -1)
40
83
  end
41
84
 
85
+ # Produces an iteration of zero or one +self+, corresponding to
86
+ # +S?+. +self+ will not be modified.
87
+ #
88
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
89
+ # @return [Fa::Automaton] the iterated automaton
42
90
  def maybe
43
91
  iter(0, 1)
44
92
  end
45
93
 
94
+ # Produces the intersection of +self+ and +other+. Neither +self+ nor
95
+ # +other+ will be modified. The resulting automaton will match all
96
+ # strings that simultaneously match +self+ and +other+,
97
+ #
98
+ # @param [Fa::Automaton] other
99
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
100
+ # @return [Fa::Automaton] the iterated automaton
46
101
  def intersect(other)
47
102
  from_ptr( FFI::intersect(faptr, other.faptr) )
48
103
  end
49
104
 
105
+ # Produces the difference of +self+ and +other+. Neither +self+ nor
106
+ # +other+ will be modified. The resulting automaton will match all
107
+ # strings that match +self+ but not +other+,
108
+ #
109
+ # @param [Fa::Automaton] other
110
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
111
+ # @return [Fa::Automaton] the iterated automaton
50
112
  def minus(other)
51
113
  from_ptr( FFI::minus(faptr, other.faptr) )
52
114
  end
53
115
 
116
+ # Produces the complement of +self+. +self+ will not be modified. The
117
+ # resulting automaton will match all strings that do _not_ match
118
+ # +self+.
119
+ #
120
+ # @raise OutOfMemoryError if +libfa+ fails to allocate memory
121
+ # @return [Fa::Automaton] the iterated automaton
54
122
  def complement
55
123
  from_ptr( FFI::complement(faptr) )
56
124
  end
57
125
 
126
+ # Returns whether +self+ and +other+ match the same set of strings
127
+ #
128
+ # @param [Fa::Automaton] other
129
+ # @return [Boolean]
58
130
  def equals(other)
59
131
  r = FFI::equals(faptr, other.faptr)
60
132
  raise Error if r < 0
61
133
  return r == 1
62
134
  end
63
135
 
136
+ # Returns whether +self+ and +other+ match the same set of strings
137
+ #
138
+ # @param [Fa::Automaton] other
139
+ # @return [Boolean]
140
+ def ==(other); equals(other); end
141
+
142
+ # Returns whether +self+ matches all the strings that +other+
143
+ # matches. +self+ may match more strings than that.
144
+ #
145
+ # @param [Fa::Automaton] other
146
+ # @return [Boolean]
64
147
  def contains(other)
65
148
  # The C function works like fa1 <= fa2, and not how the
66
149
  # Ruby nomenclature would suggest it, so swap the arguments
@@ -69,20 +152,48 @@ module Fa
69
152
  return r == 1
70
153
  end
71
154
 
72
- def is_basic(basic)
155
+ # Returns whether +other+ matches all the strings that +self+
156
+ # matches. +other+ may match more strings than that.
157
+ #
158
+ # @param [Fa::Automaton] other
159
+ # @return [Boolean]
160
+ def <=(other); other.contains(self); end
161
+
162
+ # Returns whether +self+ is the empty, epsilon or total automaton
163
+ #
164
+ # @param [:empty, :epsilon, :total] kind
165
+ # @return [Boolean]
166
+ def is_basic(kind)
73
167
  # FFI::is_basic checks if the automaton is structurally the same as
74
168
  # +basic+; we just want to check here if they accept the same
75
169
  # language. If is_basic fails, we therefore check for equality
76
- r = FFI::is_basic(faptr, basic)
170
+ r = FFI::is_basic(faptr, kind)
77
171
  return true if r == 1
78
- return equals(Fa::make_basic(basic))
172
+ return equals(Fa::make_basic(kind))
79
173
  end
80
174
 
175
+ # Returns whether +self+ is the empty automaton, i.e., matches no words
176
+ # at all
177
+ # @return [Boolean]
81
178
  def empty?; is_basic(:empty); end
179
+
180
+ # Returns whether +self+ is the epsilon automaton, i.e., matches only
181
+ # the empty string
182
+ # @return [Boolean]
82
183
  def epsilon?; is_basic(:epsilon); end
184
+
185
+ # Returns whether +self+ is the total automaton, i.e., matches all
186
+ # possible words.
187
+ # @return [Boolean]
83
188
  def total?; is_basic(:total); end
84
189
 
85
- def as_regexp
190
+ # Return the representation of +self+ as a regular expression. Note
191
+ # that that regular expression can look pretty complicated, even for
192
+ # seemingly simple automata. Sometimes, minimizing the automaton before
193
+ # turning it into a string helps; sometimes it doesn't.
194
+ #
195
+ # @return [String] the regular expression for +self+
196
+ def to_s
86
197
  rx = ::FFI::MemoryPointer.new :string
87
198
  rx_len = ::FFI::MemoryPointer.new :size_t
88
199
  r = FFI::as_regexp(faptr, rx, rx_len)
@@ -97,21 +208,36 @@ module Fa
97
208
 
98
209
  :private
99
210
  def from_ptr(ptr)
100
- raise OutOfMemoryError if ptr.nil?
211
+ raise OutOfMemoryError if ptr.null?
101
212
  Automaton.new(ptr)
102
213
  end
103
214
  end
104
215
 
216
+ # Compiles +rx+ into a finite automaton
217
+ # @param [String] rx a regular expression
218
+ # @return [Fa::Automaton] the finite automaton
105
219
  def self.compile(rx)
106
220
  faptr = ::FFI::MemoryPointer.new :pointer
107
221
  r = FFI::compile(rx, rx.size, faptr)
108
- raise Error if r < 0
222
+ raise Error if r != 0 # REG_NOERROR is 0, at least for glibc
109
223
  Automaton.new(faptr.get_pointer(0))
110
224
  end
111
225
 
226
+ # Compiles +rx+ into a finite automaton
227
+ # @param [String] rx a regular expression
228
+ # @return [Fa::Automaton] the finite automaton
229
+ def self.[](rx)
230
+ compile(rx)
231
+ end
232
+
233
+ # Makes a basic finite automaton, either an empty, epsilon, or total
234
+ # finite automaton. Those match no words, only the empty word, or all
235
+ # words.
236
+ # @param [:empty, :epsilon, :total] kind
237
+ # @return [Fa::Automaton] the finite automaton
112
238
  def self.make_basic(kind)
113
239
  faptr = FFI::make_basic(kind)
114
- raise OutOfMemoryError if faptr.nil?
240
+ raise OutOfMemoryError if faptr.null?
115
241
  Automaton.new(faptr)
116
242
  end
117
243
  end
@@ -1,3 +1,3 @@
1
1
  module Fa
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fa
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - 'David Lutterkort
@@ -68,6 +68,20 @@ dependencies:
68
68
  - - "~>"
69
69
  - !ruby/object:Gem::Version
70
70
  version: '5.0'
71
+ - !ruby/object:Gem::Dependency
72
+ name: yard
73
+ requirement: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ type: :development
79
+ prerelease: false
80
+ version_requirements: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
71
85
  description: |
72
86
  Bindings for libfa, a library to manipulate finite automata. Automata are
73
87
  constructed from regular expressions, using extended POSIX syntax, and make