regex 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/HISTORY ADDED
@@ -0,0 +1,11 @@
1
+ = RELEASE HISTORY
2
+
3
+ 1.0.0 / 2010-02-10
4
+
5
+ Initial release of Regex. Regex is a simple
6
+ commandline Regular Expression tool.
7
+
8
+ Changes:
9
+
10
+ * Happy Birthday
11
+
data/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+ The MIT License
2
+
3
+ Copyright (c) 2009 Thomas Sawyer
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
22
+
23
+
@@ -0,0 +1,25 @@
1
+ HISTORY
2
+ LICENSE
3
+ MANIFEST
4
+ README
5
+ bin/regex
6
+ lib/regex
7
+ lib/regex.rb
8
+ lib/regex/command.rb
9
+ lib/regex/extractor.rb
10
+ lib/regex/string.rb
11
+ lib/regex/templates
12
+ lib/regex/templates/common.rb
13
+ meta/authors
14
+ meta/created
15
+ meta/description
16
+ meta/download
17
+ meta/homepage
18
+ meta/mailinglist
19
+ meta/name
20
+ meta/repository
21
+ meta/summary
22
+ meta/title
23
+ meta/version
24
+ test/demos
25
+ test/demos/regex.rdoc
data/README ADDED
@@ -0,0 +1,45 @@
1
+ = Regex ("Like a Knife")
2
+
3
+ * home: http://proutils.github.com/regex
4
+ * work: http://github.com/proutils/regex
5
+
6
+ == DESCRIPTION
7
+
8
+ Yea, I know what you are going to say. "I can do that with ___" Fill in the blank
9
+ with +grep+, +awk+, +sed+, +perl+, etc. But honestly, none of these tools are
10
+ Langauge 2.0 (read "post-Ruby"). What I want is a simaple commandline tool that
11
+ given me quick access to a Regular Expression engine. No more, no less.
12
+
13
+ Now I could have written this too in Perl. I'm sure it would just as good, if not
14
+ better since Perl's Regualar Expression engine rocks, or so I hear. But Ruby's is
15
+ pretty good too, and getting better (with 1.9+). And since I know Ruby very
16
+ well. Well that's waht you get.
17
+
18
+ == USAGE
19
+
20
+ Okay, check it out. It's real simple. Supply a regular expression and a file to
21
+ match upon to the +regex+ command.
22
+
23
+ $ regex '=begin.*?\n(.*)=end' sample.txt
24
+
25
+ It does exactly what you think it would.
26
+
27
+ Check out the <tt>--help</tt> and I'm sure the rest will come to you real quick.
28
+ But it you want more information, then do us the good favor of jumping over
29
+ to the <a href="http://proutils.github.com/regex">documentation wiki</a>.
30
+
31
+ == STATUS
32
+
33
+ This is a very release. So don't expect every feature under the sun just yet, or
34
+ that every detail is going to work peachy keen. But hey, if something needs fixing
35
+ or a feature needs adding, well then get in there and send me a patch --open
36
+ source software is built on *TEAM WORK*.
37
+
38
+ And expect a potenial for rapid change here at the beginning.
39
+
40
+ == COPYRIGHT
41
+
42
+ Copyright (c) 2010 Thomas Sawyer
43
+
44
+ Regex is licensed under the terms of the MIT license.
45
+
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require 'regex'
3
+ Regex::Command.main(*ARGV)
@@ -0,0 +1,236 @@
1
+ # = Text Extraction Class
2
+ #
3
+ # Extractor is was designed particulalry for extracting source code from embedded
4
+ # comment blocks.
5
+ #
6
+ # Todo:
7
+ # - How can we handle embedded code in stadanrd comments? Eg. #
8
+ #
9
+ class Regex
10
+ VERSION = "1.1"
11
+
12
+ # When the regular expression return multiple groups,
13
+ # each is divided by the group deliminator.
14
+ # This is the default value.
15
+ DELIMINATOR_GROUP = 29.chr + "\n"
16
+
17
+ # When using repeat mode, each match is divided by
18
+ # the record deliminator. This is the default value.
19
+ DELIMINATOR_RECORD = 30.chr + "\n"
20
+
21
+ require 'fileutils'
22
+ require 'open-uri'
23
+
24
+ require 'regex/string'
25
+ require 'regex/command'
26
+
27
+ # TODO: generalize to plugin
28
+ require 'regex/templates/common'
29
+
30
+ #
31
+ #attr_accessor :text
32
+
33
+ # Remove XML tags from search.
34
+ attr_accessor :unxml
35
+
36
+ # Regular expression.
37
+ attr_accessor :pattern
38
+
39
+ # Select built-in regular expression by name.
40
+ attr_accessor :template
41
+
42
+ # Index of expression return.
43
+ attr_accessor :index
44
+
45
+ # Ignore case.
46
+ attr_accessor :insensitive
47
+
48
+ # Repeat Match.
49
+ attr_accessor :repeat
50
+
51
+ # Output format.
52
+ attr_accessor :format
53
+
54
+ # DEPRECATE: Not needed anymore.
55
+ #def self.load(io, options={}, &block)
56
+ # new(io, options, &block)
57
+ #end
58
+
59
+ # New extractor.
60
+ def initialize(io, options={})
61
+ @raw = (String === io ? io : io.read)
62
+ options.each do |k,v|
63
+ __send__("#{k}=", v)
64
+ end
65
+ yield(self) if block_given?
66
+ end
67
+
68
+ # Read file.
69
+ #def raw
70
+ # @raw ||= open(@file) # File.read(@file)
71
+ #end
72
+
73
+ #--
74
+ # TODO: unxml is too primative, use real xml parser like nokogiri
75
+ #++
76
+ def text
77
+ @text ||= (
78
+ if unxml
79
+ raw.gsub!(/\<(.*?)\>/, '')
80
+ else
81
+ @raw
82
+ end
83
+ )
84
+ end
85
+
86
+ #
87
+ def regex
88
+ @regex ||= (
89
+ if template
90
+ TEMPLATES.const_get(template.upcase)
91
+ else
92
+ case pattern
93
+ when Regexp
94
+ pattern
95
+ when String
96
+ flags = []
97
+ flags << Regexp::MULTILINE
98
+ flags << Regexp::IGNORECASE if insensitive
99
+ Regexp.new(pattern, *flags)
100
+ end
101
+ end
102
+ )
103
+ end
104
+
105
+ #
106
+ def to_s(format=nil)
107
+ case format
108
+ when :yaml
109
+ to_s_yaml
110
+ when :json
111
+ to_s_json
112
+ else
113
+ out = structure
114
+ if repeat
115
+ out = out.map{ |m| m.join(deliminator_group) }
116
+ out = out.join(deliminator_record) #.chomp("\n") + "\n"
117
+ else
118
+ out = out.join(deliminator_group) #.chomp("\n") + "\n"
119
+ end
120
+ out
121
+ end
122
+ end
123
+
124
+ #
125
+ def to_s_yaml
126
+ require 'yaml'
127
+ structure.to_yaml
128
+ end
129
+
130
+ #
131
+ def to_s_json
132
+ begin
133
+ require 'json'
134
+ rescue LoadError
135
+ require 'json_pure'
136
+ end
137
+ structure.to_json
138
+ end
139
+
140
+ # Structure the matchdata according to specified options.
141
+ def structure
142
+ repeat ? structure_repeat : structure_single
143
+ end
144
+
145
+ # Structure the matchdata for single match.
146
+ def structure_single
147
+ md = extract
148
+ if index
149
+ [md[index]]
150
+ elsif md.size > 1
151
+ md[1..-1]
152
+ else
153
+ [md[0]]
154
+ end
155
+ end
156
+
157
+ # Structure the matchdata for repeat matches.
158
+ def structure_repeat
159
+ out = extract
160
+ if index
161
+ out.map{ |md| [md[index]] }
162
+ else
163
+ out.map{ |md| md.size > 1 ? md[1..-1] : [md[0]] }
164
+ end
165
+ end
166
+
167
+ # Extract match from source text.
168
+ def extract
169
+ if repeat
170
+ extract_repeat
171
+ else
172
+ extract_single
173
+ end
174
+ end
175
+
176
+ #
177
+ #def extract_single
178
+ # out = []
179
+ # if md = matchdata
180
+ # if index
181
+ # out << md[index]
182
+ # elsif md.size > 1
183
+ # out = md[1..-1] #.join(deliminator_group)
184
+ # else
185
+ # out = md
186
+ # end
187
+ # end
188
+ # return out
189
+ #end
190
+
191
+ # Extract single match from source text.
192
+ def extract_single
193
+ md = regex.match(text)
194
+ md ? md : []
195
+ end
196
+
197
+ #
198
+ #def matchdata
199
+ # regex.match(text)
200
+ #end
201
+
202
+ #
203
+ #def extract_repeat
204
+ # out = []
205
+ # text.scan(regex) do
206
+ # md = $~
207
+ # if index
208
+ # out << [md[index]]
209
+ # elsif md.size > 1
210
+ # out << md[1..-1] #.join(deliminator_group)
211
+ # else
212
+ # out << md
213
+ # end
214
+ # end
215
+ # out #.join(deliminator_record)
216
+ #end
217
+
218
+ # Extract repeat matches from source text.
219
+ def extract_repeat
220
+ out = []
221
+ text.scan(regex) do
222
+ out << $~
223
+ end
224
+ out
225
+ end
226
+
227
+ def deliminator_group
228
+ DELIMINATOR_GROUP
229
+ end
230
+
231
+ def deliminator_record
232
+ DELIMINATOR_RECORD
233
+ end
234
+
235
+ end
236
+
@@ -0,0 +1,108 @@
1
+ require 'regex'
2
+
3
+ class Regex
4
+
5
+ # Commandline interface.
6
+ #
7
+ class Command
8
+
9
+ #
10
+ attr :file
11
+
12
+ #
13
+ attr :format
14
+
15
+ #
16
+ attr :options
17
+
18
+ #
19
+ def self.main(*argv)
20
+ new(*argv).main
21
+ end
22
+
23
+ # New Command.
24
+ def initialize(*argv)
25
+ @file = nil
26
+ @format = nil
27
+ @options = {}
28
+ parse(*argv)
29
+ end
30
+
31
+ #
32
+ def parse(*argv)
33
+ parser.parse!(argv)
34
+ unless @options[:template]
35
+ @options[:pattern] = argv.shift
36
+ end
37
+ @file = argv.shift
38
+ if @file
39
+ unless File.file?(@file)
40
+ puts "No such file -- '#{file}'."
41
+ exit 1
42
+ end
43
+ end
44
+ end
45
+
46
+ # OptionParser instance.
47
+ def parser
48
+ require 'optparse'
49
+ @options = {}
50
+ OptionParser.new do |opt|
51
+ opt.on('--template', '-t NAME', "select a built-in regular expression") do |name|
52
+ @options[:template] = name
53
+ end
54
+
55
+ opt.on('--index', '-n INT', "return a specific match index") do |int|
56
+ @options[:index] = int.to_i
57
+ end
58
+
59
+ opt.on('--insensitive', '-i', "case insensitive matching") do
60
+ @options[:insensitive] = true
61
+ end
62
+
63
+ opt.on('--unxml', '-x', "ignore XML/HTML tags") do
64
+ @options[:unxml] = true
65
+ end
66
+
67
+ opt.on('--repeat', '-r', "find all matching occurances") do
68
+ @options[:repeat] = true
69
+ end
70
+
71
+ opt.on('--yaml', '-y', "output in YAML format") do
72
+ @format = :yaml
73
+ end
74
+
75
+ opt.on('--json', '-j', "output in JSON format") do
76
+ @format = :json
77
+ end
78
+
79
+ opt.on_tail('--help', '-h', "display this lovely help message") do
80
+ puts opt
81
+ exit 0
82
+ end
83
+ end
84
+ end
85
+
86
+ #
87
+ def extraction
88
+ target = file ? File.new(file) : ARGF
89
+ Regex.new(target, options)
90
+ end
91
+
92
+ # Extract and display.
93
+ def main
94
+ begin
95
+ puts extraction.to_s(@format)
96
+ rescue => error
97
+ if $DEBUG
98
+ raise error
99
+ else
100
+ abort error.to_s
101
+ end
102
+ end
103
+ end
104
+
105
+ end
106
+
107
+ end
108
+
@@ -0,0 +1 @@
1
+
@@ -0,0 +1,68 @@
1
+ class Regex
2
+
3
+ # Extensions for String class.
4
+ # These methods are taken directly from Ruby Facets.
5
+ #
6
+ module String
7
+
8
+ # Provides a margin controlled string.
9
+ #
10
+ # x = %Q{
11
+ # | This
12
+ # | is
13
+ # | margin controlled!
14
+ # }.margin
15
+ #
16
+ #
17
+ # NOTE: This may still need a bit of tweaking.
18
+ #
19
+ # CREDIT: Trans
20
+
21
+ def margin(n=0)
22
+ #d = /\A.*\n\s*(.)/.match( self )[1]
23
+ #d = /\A\s*(.)/.match( self)[1] unless d
24
+ d = ((/\A.*\n\s*(.)/.match(self)) ||
25
+ (/\A\s*(.)/.match(self)))[1]
26
+ return '' unless d
27
+ if n == 0
28
+ gsub(/\n\s*\Z/,'').gsub(/^\s*[#{d}]/, '')
29
+ else
30
+ gsub(/\n\s*\Z/,'').gsub(/^\s*[#{d}]/, ' ' * n)
31
+ end
32
+ end
33
+
34
+ # Preserves relative tabbing.
35
+ # The first non-empty line ends up with n spaces before nonspace.
36
+ #
37
+ # CREDIT: Gavin Sinclair
38
+
39
+ def tabto(n)
40
+ if self =~ /^( *)\S/
41
+ indent(n - $1.length)
42
+ else
43
+ self
44
+ end
45
+ end
46
+
47
+ # Indent left or right by n spaces.
48
+ # (This used to be called #tab and aliased as #indent.)
49
+ #
50
+ # CREDIT: Gavin Sinclair
51
+ # CREDIT: Trans
52
+
53
+ def indent(n)
54
+ if n >= 0
55
+ gsub(/^/, ' ' * n)
56
+ else
57
+ gsub(/^ {0,#{-n}}/, "")
58
+ end
59
+ end
60
+
61
+ end
62
+
63
+ class ::String #:nodoc:
64
+ include Regex::String
65
+ end
66
+
67
+ end
68
+
@@ -0,0 +1,13 @@
1
+ class Regex
2
+
3
+ #
4
+ module TEMPLATES
5
+ MLTAG = /<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)<\/\1>/i
6
+ IP = /\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/
7
+ EMAIL = /([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)/i
8
+ USPHONE = /(\d\d\d[-]|\(\d\d\d\))?(\d\d\d)[-](\d\d\d\d)/
9
+ RUBYBLOCK = /^=begin\s+(.*?)\n(.*?)\n=end/mi
10
+ RUBYMETHOD = /\A\s*(\#.*?)^\s*def\s+(.*?)$/mi
11
+ end
12
+
13
+ end
@@ -0,0 +1,2 @@
1
+ Thomas Sawyer
2
+ Tyler Rick
@@ -0,0 +1 @@
1
+ 2006-05-09
@@ -0,0 +1 @@
1
+ Regex is simple commmandline Regular Expression tool.
@@ -0,0 +1 @@
1
+ http://github.com/proutils/regex/downloads
@@ -0,0 +1 @@
1
+ http://proutils.github.com/regex
@@ -0,0 +1 @@
1
+ http://groups.google.com/group/proutils/topics?hl=en
@@ -0,0 +1 @@
1
+ regex
@@ -0,0 +1 @@
1
+ git://github.com/proutils/regex.git
@@ -0,0 +1 @@
1
+ Regex is simple commmandline Regular Expression tool.
@@ -0,0 +1 @@
1
+ Regex
@@ -0,0 +1 @@
1
+ 1.0.0
@@ -0,0 +1,44 @@
1
+ = Regex class
2
+
3
+ Regex is really mean to be used on the commandline
4
+ since it is really nothing more than a front end
5
+ to Ruby's regular expression engine. But we will
6
+ demonstrate it's use in code just the same, and to
7
+ help ensure code quality.
8
+
9
+ First we need to require the Regex library.
10
+
11
+ require 'regex'
12
+
13
+ Now let's create some material to work with.
14
+
15
+ text = "We will match against this string."
16
+
17
+ Now we can then create a Regex object using the text.
18
+ We will also suppoly a matching pattern, as none of
19
+ the matching functions will work without providing
20
+ a pattern or the name of built-in pattern template.
21
+
22
+ regex = Regex.new(text, :pattern=>'\w+')
23
+
24
+ We can see that the Regex object has converted the pattern
25
+ into the expected regular expression via the #regex method.
26
+
27
+ regex.regex.assert == /\w+/m
28
+
29
+ Under the hood, Regex has split the process of matching,
30
+ organizing and formating the results into separate methods.
31
+ We can use the #structure method to see thematch results
32
+ organized into uniform arrays.
33
+
34
+ regex.structure.assert == %w{We}
35
+
36
+ Whereas the last use only returns a single metch, if we turn
37
+ on repeat mode we can see every word.
38
+
39
+ regex.repeat = true
40
+
41
+ regex.structure.assert == %w{We will match against this string}.map{ |e| [e] }
42
+
43
+ Notice that repeat mode creates an array in an array.
44
+
metadata ADDED
@@ -0,0 +1,87 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: regex
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 1
7
+ - 0
8
+ - 0
9
+ version: 1.0.0
10
+ platform: ruby
11
+ authors:
12
+ - Thomas Sawyer
13
+ - Tyler Rick
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-02-12 00:00:00 -05:00
19
+ default_executable:
20
+ dependencies: []
21
+
22
+ description: Regex is simple commmandline Regular Expression tool.
23
+ email:
24
+ executables:
25
+ - regex
26
+ extensions: []
27
+
28
+ extra_rdoc_files:
29
+ - README
30
+ files:
31
+ - HISTORY
32
+ - LICENSE
33
+ - MANIFEST
34
+ - README
35
+ - bin/regex
36
+ - lib/regex.rb
37
+ - lib/regex/command.rb
38
+ - lib/regex/extractor.rb
39
+ - lib/regex/string.rb
40
+ - lib/regex/templates/common.rb
41
+ - meta/authors
42
+ - meta/created
43
+ - meta/description
44
+ - meta/download
45
+ - meta/homepage
46
+ - meta/mailinglist
47
+ - meta/name
48
+ - meta/repository
49
+ - meta/summary
50
+ - meta/title
51
+ - meta/version
52
+ - test/demos/regex.rdoc
53
+ has_rdoc: true
54
+ homepage: http://proutils.github.com/regex
55
+ licenses: []
56
+
57
+ post_install_message:
58
+ rdoc_options:
59
+ - --title
60
+ - Regex API
61
+ - --main
62
+ - README
63
+ require_paths:
64
+ - lib
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ segments:
70
+ - 0
71
+ version: "0"
72
+ required_rubygems_version: !ruby/object:Gem::Requirement
73
+ requirements:
74
+ - - ">="
75
+ - !ruby/object:Gem::Version
76
+ segments:
77
+ - 0
78
+ version: "0"
79
+ requirements: []
80
+
81
+ rubyforge_project: regex
82
+ rubygems_version: 1.3.6.pre.3
83
+ signing_key:
84
+ specification_version: 3
85
+ summary: Regex is simple commmandline Regular Expression tool.
86
+ test_files: []
87
+