regex 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/HISTORY ADDED
@@ -0,0 +1,11 @@
1
+ = RELEASE HISTORY
2
+
3
+ 1.0.0 / 2010-02-10
4
+
5
+ Initial release of Regex. Regex is a simple
6
+ commandline Regular Expression tool.
7
+
8
+ Changes:
9
+
10
+ * Happy Birthday
11
+
data/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+ The MIT License
2
+
3
+ Copyright (c) 2009 Thomas Sawyer
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
22
+
23
+
@@ -0,0 +1,25 @@
1
+ HISTORY
2
+ LICENSE
3
+ MANIFEST
4
+ README
5
+ bin/regex
6
+ lib/regex
7
+ lib/regex.rb
8
+ lib/regex/command.rb
9
+ lib/regex/extractor.rb
10
+ lib/regex/string.rb
11
+ lib/regex/templates
12
+ lib/regex/templates/common.rb
13
+ meta/authors
14
+ meta/created
15
+ meta/description
16
+ meta/download
17
+ meta/homepage
18
+ meta/mailinglist
19
+ meta/name
20
+ meta/repository
21
+ meta/summary
22
+ meta/title
23
+ meta/version
24
+ test/demos
25
+ test/demos/regex.rdoc
data/README ADDED
@@ -0,0 +1,45 @@
1
+ = Regex ("Like a Knife")
2
+
3
+ * home: http://proutils.github.com/regex
4
+ * work: http://github.com/proutils/regex
5
+
6
+ == DESCRIPTION
7
+
8
+ Yea, I know what you are going to say. "I can do that with ___" Fill in the blank
9
+ with +grep+, +awk+, +sed+, +perl+, etc. But honestly, none of these tools are
10
+ Langauge 2.0 (read "post-Ruby"). What I want is a simaple commandline tool that
11
+ given me quick access to a Regular Expression engine. No more, no less.
12
+
13
+ Now I could have written this too in Perl. I'm sure it would just as good, if not
14
+ better since Perl's Regualar Expression engine rocks, or so I hear. But Ruby's is
15
+ pretty good too, and getting better (with 1.9+). And since I know Ruby very
16
+ well. Well that's waht you get.
17
+
18
+ == USAGE
19
+
20
+ Okay, check it out. It's real simple. Supply a regular expression and a file to
21
+ match upon to the +regex+ command.
22
+
23
+ $ regex '=begin.*?\n(.*)=end' sample.txt
24
+
25
+ It does exactly what you think it would.
26
+
27
+ Check out the <tt>--help</tt> and I'm sure the rest will come to you real quick.
28
+ But it you want more information, then do us the good favor of jumping over
29
+ to the <a href="http://proutils.github.com/regex">documentation wiki</a>.
30
+
31
+ == STATUS
32
+
33
+ This is a very release. So don't expect every feature under the sun just yet, or
34
+ that every detail is going to work peachy keen. But hey, if something needs fixing
35
+ or a feature needs adding, well then get in there and send me a patch --open
36
+ source software is built on *TEAM WORK*.
37
+
38
+ And expect a potenial for rapid change here at the beginning.
39
+
40
+ == COPYRIGHT
41
+
42
+ Copyright (c) 2010 Thomas Sawyer
43
+
44
+ Regex is licensed under the terms of the MIT license.
45
+
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require 'regex'
3
+ Regex::Command.main(*ARGV)
@@ -0,0 +1,236 @@
1
+ # = Text Extraction Class
2
+ #
3
+ # Extractor is was designed particulalry for extracting source code from embedded
4
+ # comment blocks.
5
+ #
6
+ # Todo:
7
+ # - How can we handle embedded code in stadanrd comments? Eg. #
8
+ #
9
+ class Regex
10
+ VERSION = "1.1"
11
+
12
+ # When the regular expression return multiple groups,
13
+ # each is divided by the group deliminator.
14
+ # This is the default value.
15
+ DELIMINATOR_GROUP = 29.chr + "\n"
16
+
17
+ # When using repeat mode, each match is divided by
18
+ # the record deliminator. This is the default value.
19
+ DELIMINATOR_RECORD = 30.chr + "\n"
20
+
21
+ require 'fileutils'
22
+ require 'open-uri'
23
+
24
+ require 'regex/string'
25
+ require 'regex/command'
26
+
27
+ # TODO: generalize to plugin
28
+ require 'regex/templates/common'
29
+
30
+ #
31
+ #attr_accessor :text
32
+
33
+ # Remove XML tags from search.
34
+ attr_accessor :unxml
35
+
36
+ # Regular expression.
37
+ attr_accessor :pattern
38
+
39
+ # Select built-in regular expression by name.
40
+ attr_accessor :template
41
+
42
+ # Index of expression return.
43
+ attr_accessor :index
44
+
45
+ # Ignore case.
46
+ attr_accessor :insensitive
47
+
48
+ # Repeat Match.
49
+ attr_accessor :repeat
50
+
51
+ # Output format.
52
+ attr_accessor :format
53
+
54
+ # DEPRECATE: Not needed anymore.
55
+ #def self.load(io, options={}, &block)
56
+ # new(io, options, &block)
57
+ #end
58
+
59
+ # New extractor.
60
+ def initialize(io, options={})
61
+ @raw = (String === io ? io : io.read)
62
+ options.each do |k,v|
63
+ __send__("#{k}=", v)
64
+ end
65
+ yield(self) if block_given?
66
+ end
67
+
68
+ # Read file.
69
+ #def raw
70
+ # @raw ||= open(@file) # File.read(@file)
71
+ #end
72
+
73
+ #--
74
+ # TODO: unxml is too primative, use real xml parser like nokogiri
75
+ #++
76
+ def text
77
+ @text ||= (
78
+ if unxml
79
+ raw.gsub!(/\<(.*?)\>/, '')
80
+ else
81
+ @raw
82
+ end
83
+ )
84
+ end
85
+
86
+ #
87
+ def regex
88
+ @regex ||= (
89
+ if template
90
+ TEMPLATES.const_get(template.upcase)
91
+ else
92
+ case pattern
93
+ when Regexp
94
+ pattern
95
+ when String
96
+ flags = []
97
+ flags << Regexp::MULTILINE
98
+ flags << Regexp::IGNORECASE if insensitive
99
+ Regexp.new(pattern, *flags)
100
+ end
101
+ end
102
+ )
103
+ end
104
+
105
+ #
106
+ def to_s(format=nil)
107
+ case format
108
+ when :yaml
109
+ to_s_yaml
110
+ when :json
111
+ to_s_json
112
+ else
113
+ out = structure
114
+ if repeat
115
+ out = out.map{ |m| m.join(deliminator_group) }
116
+ out = out.join(deliminator_record) #.chomp("\n") + "\n"
117
+ else
118
+ out = out.join(deliminator_group) #.chomp("\n") + "\n"
119
+ end
120
+ out
121
+ end
122
+ end
123
+
124
+ #
125
+ def to_s_yaml
126
+ require 'yaml'
127
+ structure.to_yaml
128
+ end
129
+
130
+ #
131
+ def to_s_json
132
+ begin
133
+ require 'json'
134
+ rescue LoadError
135
+ require 'json_pure'
136
+ end
137
+ structure.to_json
138
+ end
139
+
140
+ # Structure the matchdata according to specified options.
141
+ def structure
142
+ repeat ? structure_repeat : structure_single
143
+ end
144
+
145
+ # Structure the matchdata for single match.
146
+ def structure_single
147
+ md = extract
148
+ if index
149
+ [md[index]]
150
+ elsif md.size > 1
151
+ md[1..-1]
152
+ else
153
+ [md[0]]
154
+ end
155
+ end
156
+
157
+ # Structure the matchdata for repeat matches.
158
+ def structure_repeat
159
+ out = extract
160
+ if index
161
+ out.map{ |md| [md[index]] }
162
+ else
163
+ out.map{ |md| md.size > 1 ? md[1..-1] : [md[0]] }
164
+ end
165
+ end
166
+
167
+ # Extract match from source text.
168
+ def extract
169
+ if repeat
170
+ extract_repeat
171
+ else
172
+ extract_single
173
+ end
174
+ end
175
+
176
+ #
177
+ #def extract_single
178
+ # out = []
179
+ # if md = matchdata
180
+ # if index
181
+ # out << md[index]
182
+ # elsif md.size > 1
183
+ # out = md[1..-1] #.join(deliminator_group)
184
+ # else
185
+ # out = md
186
+ # end
187
+ # end
188
+ # return out
189
+ #end
190
+
191
+ # Extract single match from source text.
192
+ def extract_single
193
+ md = regex.match(text)
194
+ md ? md : []
195
+ end
196
+
197
+ #
198
+ #def matchdata
199
+ # regex.match(text)
200
+ #end
201
+
202
+ #
203
+ #def extract_repeat
204
+ # out = []
205
+ # text.scan(regex) do
206
+ # md = $~
207
+ # if index
208
+ # out << [md[index]]
209
+ # elsif md.size > 1
210
+ # out << md[1..-1] #.join(deliminator_group)
211
+ # else
212
+ # out << md
213
+ # end
214
+ # end
215
+ # out #.join(deliminator_record)
216
+ #end
217
+
218
+ # Extract repeat matches from source text.
219
+ def extract_repeat
220
+ out = []
221
+ text.scan(regex) do
222
+ out << $~
223
+ end
224
+ out
225
+ end
226
+
227
+ def deliminator_group
228
+ DELIMINATOR_GROUP
229
+ end
230
+
231
+ def deliminator_record
232
+ DELIMINATOR_RECORD
233
+ end
234
+
235
+ end
236
+
@@ -0,0 +1,108 @@
1
+ require 'regex'
2
+
3
+ class Regex
4
+
5
+ # Commandline interface.
6
+ #
7
+ class Command
8
+
9
+ #
10
+ attr :file
11
+
12
+ #
13
+ attr :format
14
+
15
+ #
16
+ attr :options
17
+
18
+ #
19
+ def self.main(*argv)
20
+ new(*argv).main
21
+ end
22
+
23
+ # New Command.
24
+ def initialize(*argv)
25
+ @file = nil
26
+ @format = nil
27
+ @options = {}
28
+ parse(*argv)
29
+ end
30
+
31
+ #
32
+ def parse(*argv)
33
+ parser.parse!(argv)
34
+ unless @options[:template]
35
+ @options[:pattern] = argv.shift
36
+ end
37
+ @file = argv.shift
38
+ if @file
39
+ unless File.file?(@file)
40
+ puts "No such file -- '#{file}'."
41
+ exit 1
42
+ end
43
+ end
44
+ end
45
+
46
+ # OptionParser instance.
47
+ def parser
48
+ require 'optparse'
49
+ @options = {}
50
+ OptionParser.new do |opt|
51
+ opt.on('--template', '-t NAME', "select a built-in regular expression") do |name|
52
+ @options[:template] = name
53
+ end
54
+
55
+ opt.on('--index', '-n INT', "return a specific match index") do |int|
56
+ @options[:index] = int.to_i
57
+ end
58
+
59
+ opt.on('--insensitive', '-i', "case insensitive matching") do
60
+ @options[:insensitive] = true
61
+ end
62
+
63
+ opt.on('--unxml', '-x', "ignore XML/HTML tags") do
64
+ @options[:unxml] = true
65
+ end
66
+
67
+ opt.on('--repeat', '-r', "find all matching occurances") do
68
+ @options[:repeat] = true
69
+ end
70
+
71
+ opt.on('--yaml', '-y', "output in YAML format") do
72
+ @format = :yaml
73
+ end
74
+
75
+ opt.on('--json', '-j', "output in JSON format") do
76
+ @format = :json
77
+ end
78
+
79
+ opt.on_tail('--help', '-h', "display this lovely help message") do
80
+ puts opt
81
+ exit 0
82
+ end
83
+ end
84
+ end
85
+
86
+ #
87
+ def extraction
88
+ target = file ? File.new(file) : ARGF
89
+ Regex.new(target, options)
90
+ end
91
+
92
+ # Extract and display.
93
+ def main
94
+ begin
95
+ puts extraction.to_s(@format)
96
+ rescue => error
97
+ if $DEBUG
98
+ raise error
99
+ else
100
+ abort error.to_s
101
+ end
102
+ end
103
+ end
104
+
105
+ end
106
+
107
+ end
108
+
@@ -0,0 +1 @@
1
+
@@ -0,0 +1,68 @@
1
+ class Regex
2
+
3
+ # Extensions for String class.
4
+ # These methods are taken directly from Ruby Facets.
5
+ #
6
+ module String
7
+
8
+ # Provides a margin controlled string.
9
+ #
10
+ # x = %Q{
11
+ # | This
12
+ # | is
13
+ # | margin controlled!
14
+ # }.margin
15
+ #
16
+ #
17
+ # NOTE: This may still need a bit of tweaking.
18
+ #
19
+ # CREDIT: Trans
20
+
21
+ def margin(n=0)
22
+ #d = /\A.*\n\s*(.)/.match( self )[1]
23
+ #d = /\A\s*(.)/.match( self)[1] unless d
24
+ d = ((/\A.*\n\s*(.)/.match(self)) ||
25
+ (/\A\s*(.)/.match(self)))[1]
26
+ return '' unless d
27
+ if n == 0
28
+ gsub(/\n\s*\Z/,'').gsub(/^\s*[#{d}]/, '')
29
+ else
30
+ gsub(/\n\s*\Z/,'').gsub(/^\s*[#{d}]/, ' ' * n)
31
+ end
32
+ end
33
+
34
+ # Preserves relative tabbing.
35
+ # The first non-empty line ends up with n spaces before nonspace.
36
+ #
37
+ # CREDIT: Gavin Sinclair
38
+
39
+ def tabto(n)
40
+ if self =~ /^( *)\S/
41
+ indent(n - $1.length)
42
+ else
43
+ self
44
+ end
45
+ end
46
+
47
+ # Indent left or right by n spaces.
48
+ # (This used to be called #tab and aliased as #indent.)
49
+ #
50
+ # CREDIT: Gavin Sinclair
51
+ # CREDIT: Trans
52
+
53
+ def indent(n)
54
+ if n >= 0
55
+ gsub(/^/, ' ' * n)
56
+ else
57
+ gsub(/^ {0,#{-n}}/, "")
58
+ end
59
+ end
60
+
61
+ end
62
+
63
+ class ::String #:nodoc:
64
+ include Regex::String
65
+ end
66
+
67
+ end
68
+
@@ -0,0 +1,13 @@
1
+ class Regex
2
+
3
+ #
4
+ module TEMPLATES
5
+ MLTAG = /<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)<\/\1>/i
6
+ IP = /\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/
7
+ EMAIL = /([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)/i
8
+ USPHONE = /(\d\d\d[-]|\(\d\d\d\))?(\d\d\d)[-](\d\d\d\d)/
9
+ RUBYBLOCK = /^=begin\s+(.*?)\n(.*?)\n=end/mi
10
+ RUBYMETHOD = /\A\s*(\#.*?)^\s*def\s+(.*?)$/mi
11
+ end
12
+
13
+ end
@@ -0,0 +1,2 @@
1
+ Thomas Sawyer
2
+ Tyler Rick
@@ -0,0 +1 @@
1
+ 2006-05-09
@@ -0,0 +1 @@
1
+ Regex is simple commmandline Regular Expression tool.
@@ -0,0 +1 @@
1
+ http://github.com/proutils/regex/downloads
@@ -0,0 +1 @@
1
+ http://proutils.github.com/regex
@@ -0,0 +1 @@
1
+ http://groups.google.com/group/proutils/topics?hl=en
@@ -0,0 +1 @@
1
+ regex
@@ -0,0 +1 @@
1
+ git://github.com/proutils/regex.git
@@ -0,0 +1 @@
1
+ Regex is simple commmandline Regular Expression tool.
@@ -0,0 +1 @@
1
+ Regex
@@ -0,0 +1 @@
1
+ 1.0.0
@@ -0,0 +1,44 @@
1
+ = Regex class
2
+
3
+ Regex is really mean to be used on the commandline
4
+ since it is really nothing more than a front end
5
+ to Ruby's regular expression engine. But we will
6
+ demonstrate it's use in code just the same, and to
7
+ help ensure code quality.
8
+
9
+ First we need to require the Regex library.
10
+
11
+ require 'regex'
12
+
13
+ Now let's create some material to work with.
14
+
15
+ text = "We will match against this string."
16
+
17
+ Now we can then create a Regex object using the text.
18
+ We will also suppoly a matching pattern, as none of
19
+ the matching functions will work without providing
20
+ a pattern or the name of built-in pattern template.
21
+
22
+ regex = Regex.new(text, :pattern=>'\w+')
23
+
24
+ We can see that the Regex object has converted the pattern
25
+ into the expected regular expression via the #regex method.
26
+
27
+ regex.regex.assert == /\w+/m
28
+
29
+ Under the hood, Regex has split the process of matching,
30
+ organizing and formating the results into separate methods.
31
+ We can use the #structure method to see thematch results
32
+ organized into uniform arrays.
33
+
34
+ regex.structure.assert == %w{We}
35
+
36
+ Whereas the last use only returns a single metch, if we turn
37
+ on repeat mode we can see every word.
38
+
39
+ regex.repeat = true
40
+
41
+ regex.structure.assert == %w{We will match against this string}.map{ |e| [e] }
42
+
43
+ Notice that repeat mode creates an array in an array.
44
+
metadata ADDED
@@ -0,0 +1,87 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: regex
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 1
7
+ - 0
8
+ - 0
9
+ version: 1.0.0
10
+ platform: ruby
11
+ authors:
12
+ - Thomas Sawyer
13
+ - Tyler Rick
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-02-12 00:00:00 -05:00
19
+ default_executable:
20
+ dependencies: []
21
+
22
+ description: Regex is simple commmandline Regular Expression tool.
23
+ email:
24
+ executables:
25
+ - regex
26
+ extensions: []
27
+
28
+ extra_rdoc_files:
29
+ - README
30
+ files:
31
+ - HISTORY
32
+ - LICENSE
33
+ - MANIFEST
34
+ - README
35
+ - bin/regex
36
+ - lib/regex.rb
37
+ - lib/regex/command.rb
38
+ - lib/regex/extractor.rb
39
+ - lib/regex/string.rb
40
+ - lib/regex/templates/common.rb
41
+ - meta/authors
42
+ - meta/created
43
+ - meta/description
44
+ - meta/download
45
+ - meta/homepage
46
+ - meta/mailinglist
47
+ - meta/name
48
+ - meta/repository
49
+ - meta/summary
50
+ - meta/title
51
+ - meta/version
52
+ - test/demos/regex.rdoc
53
+ has_rdoc: true
54
+ homepage: http://proutils.github.com/regex
55
+ licenses: []
56
+
57
+ post_install_message:
58
+ rdoc_options:
59
+ - --title
60
+ - Regex API
61
+ - --main
62
+ - README
63
+ require_paths:
64
+ - lib
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ segments:
70
+ - 0
71
+ version: "0"
72
+ required_rubygems_version: !ruby/object:Gem::Requirement
73
+ requirements:
74
+ - - ">="
75
+ - !ruby/object:Gem::Version
76
+ segments:
77
+ - 0
78
+ version: "0"
79
+ requirements: []
80
+
81
+ rubyforge_project: regex
82
+ rubygems_version: 1.3.6.pre.3
83
+ signing_key:
84
+ specification_version: 3
85
+ summary: Regex is simple commmandline Regular Expression tool.
86
+ test_files: []
87
+