imw 0.2.4 → 0.2.5
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +174 -86
- data/VERSION +1 -1
- data/lib/imw/formats/delimited.rb +5 -5
- data/lib/imw/formats/json.rb +10 -18
- data/lib/imw/formats/yaml.rb +11 -19
- data/lib/imw/resource.rb +26 -0
- data/lib/imw/schemes/local.rb +59 -10
- data/lib/imw/tools/extension_analyzer.rb +108 -0
- data/lib/imw/tools/summarizer.rb +31 -133
- data/lib/imw/utils/log.rb +2 -2
- data/spec/data/sample.json +782 -1
- data/spec/data/sample.yaml +650 -651
- data/spec/imw/formats/delimited_spec.rb +0 -12
- data/spec/imw/formats/json_spec.rb +1 -15
- data/spec/imw/formats/yaml_spec.rb +1 -23
- data/spec/imw/resource_spec.rb +26 -0
- data/spec/imw/schemes/local_spec.rb +1 -1
- metadata +3 -2
data/README.rdoc
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
|
2
|
-
=
|
2
|
+
= What is the Infinite Monkeywrench?
|
3
3
|
|
4
4
|
The Infinite Monkeywrench (IMW) is a Ruby frameworks to simplify the
|
5
5
|
tasks of acquiring, extracting, transforming, loading, and packaging
|
@@ -23,7 +23,7 @@ data. It has the following goals:
|
|
23
23
|
* Let you incorporate your own tools wherever you choose to.
|
24
24
|
|
25
25
|
The Infinite Monkeywrench is a powerful tool but it is not always the
|
26
|
-
right
|
26
|
+
right tool. IMW is **not** designed for
|
27
27
|
|
28
28
|
* Scraping vast amounts of data (use Wuclan[http://github.com/infochimps/wuclan] and Monkeyshines[http://github.com/infochimps/monkeyshines])
|
29
29
|
|
@@ -33,14 +33,14 @@ right one to use. IMW is **not** designed for
|
|
33
33
|
|
34
34
|
* Visualization
|
35
35
|
|
36
|
-
=
|
36
|
+
= Installation
|
37
37
|
|
38
38
|
IMW is hosted on Gemcutter[http://gemcutter.org] so it's easy to install.
|
39
39
|
|
40
|
-
You'll have to
|
40
|
+
You'll have to add <tt>http://gemcutter.org</tt> to your gem sources
|
41
|
+
if it isn't there already:
|
41
42
|
|
42
|
-
$
|
43
|
-
$ gem tumble
|
43
|
+
$ gem sources -a http://gemcutter.org
|
44
44
|
|
45
45
|
and then install IMW
|
46
46
|
|
@@ -59,49 +59,82 @@ _anything_ with a URI and you create one using IMW.open.
|
|
59
59
|
|
60
60
|
csv = IMW.open('/path/to/my_data.csv')
|
61
61
|
html = IMW.open('http://www.infochimps.com')
|
62
|
-
tar_bz2 = IMW.open(
|
63
62
|
|
64
63
|
IMW dynamically extends a resource with modules appropriate to it when
|
65
64
|
you open it. In the above case, +csv+ would be automatically extended
|
66
65
|
by the IMW::Resources::Formats::Csv module, among others:
|
67
66
|
|
68
67
|
csv.resource_modules
|
69
|
-
=> [IMW::
|
68
|
+
=> [IMW::Schemes::Local::Base, IMW::Schemes::Local::LocalFile, IMW::CompressedFiles::Compressible, IMW::Formats::Csv]
|
70
69
|
|
71
70
|
while +html+ will use a different set
|
72
71
|
|
73
72
|
html.resource_modules
|
74
|
-
=> [IMW::
|
75
|
-
|
73
|
+
=> [IMW::Schemes::Remote::Base, IMW::Schemes::Remote::RemoteFile, IMW::Schemes::HTTP, IMW::Formats::Html]
|
76
74
|
|
77
75
|
Consult the documentation for the modules a resource uses to learn
|
78
76
|
what it can do.
|
79
77
|
|
80
|
-
|
78
|
+
== Including/Excluding Resource Modules
|
79
|
+
|
80
|
+
You can exercise finer control of the resource modules IMW will extend
|
81
|
+
a given resource with by passing the <tt>:as</tt> and <tt>:without</tt>.
|
82
|
+
|
83
|
+
IMW.open('http://www.infochimps.com/some_raw_data', :without => [IMW::Formats::Html]).resource_modules
|
84
|
+
=> [IMW::Schemes::Remote::Base, IMW::Schemes::Remote::RemoteFile, IMW::Schemes::HTTP]
|
85
|
+
|
86
|
+
IMW.open('http://www.infochimps.com', :as => [IMW::Formats::Json]).resource_modules
|
87
|
+
=> [IMW::Schemes::Remote::Base, IMW::Schemes::Remote::RemoteFile, IMW::Schemes::HTTP, IMW::Formats::Json]
|
88
|
+
|
89
|
+
You can also pass <tt>:no_modules</tt> to not use any resource
|
90
|
+
modules.
|
91
|
+
|
92
|
+
== Handlers and Custom Resource Modules
|
93
|
+
|
94
|
+
IMW chooses which resource modules to extend an IMW::Resource by
|
95
|
+
iterating through an array of handlers, passing the resource to the
|
96
|
+
handler, and letting the handler's response (true/false) determine
|
97
|
+
whether or not to extend the resource with the module accompanying the
|
98
|
+
handler.
|
99
|
+
|
100
|
+
You can hook into this process by defining your own handlers. To
|
101
|
+
define a handler which should extend with +MyModule+ any resource with
|
102
|
+
a URI ending with <tt>.xxx</tt>
|
81
103
|
|
82
|
-
|
104
|
+
IMW::Resource.register_handler MyModule, /\.xxx$/
|
83
105
|
|
84
|
-
You can
|
106
|
+
You can also use a Proc instead of a Regexp for more control. If the
|
107
|
+
result output of the Proc called with a resource is evaluates true
|
108
|
+
then the resource will be extended by +MyModule+.
|
109
|
+
|
110
|
+
IMW::Resource.register_handler MyModule, Proc.new { |resource| resource.is_local? && resource.path =~ /\.xxx$/ }
|
111
|
+
|
112
|
+
= Manipulating Paths
|
85
113
|
|
86
114
|
IMW holds a registry of paths that you can define on the fly or store
|
87
|
-
in a configuration file.
|
115
|
+
in a configuration file. Defining paths once in the registry and then
|
116
|
+
referring to them forever after by name helps keep your code flexible
|
117
|
+
as well as portable.
|
88
118
|
|
89
|
-
IMW.add_path(:dropbox, "/var/www/public
|
90
|
-
IMW.path_to(:dropbox)
|
119
|
+
IMW.add_path(:dropbox, "/var/www/public")
|
120
|
+
IMW.path_to(:dropbox)
|
121
|
+
=> "/var/www/public"
|
91
122
|
|
92
|
-
You can combine
|
123
|
+
You can combine named references together dynamically.
|
93
124
|
|
94
|
-
IMW.add_path(:raw, "
|
95
|
-
IMW.path_to(:raw
|
96
|
-
|
97
|
-
IMW.path_to(:
|
125
|
+
IMW.add_path(:raw, :dropbox, "raw")
|
126
|
+
IMW.path_to(:raw)
|
127
|
+
=> "/var/www/public/raw"
|
128
|
+
IMW.path_to(:raw, "my/dataset")
|
129
|
+
=> "/var/www/public/raw/my/dataset
|
98
130
|
|
99
131
|
Altering one path will update others
|
100
132
|
|
101
|
-
IMW.add_path(:
|
102
|
-
IMW.path_to(:
|
133
|
+
IMW.add_path(:dropbox, "/data") # redefines :raw
|
134
|
+
IMW.path_to(:raw, "my/dataset)
|
135
|
+
=> "/data/raw/my/dataset" # not /var/www/public/raw/my/dataset
|
103
136
|
|
104
|
-
|
137
|
+
= Files & Directories
|
105
138
|
|
106
139
|
Use IMW.open to open files. The object returned by IMW.open obeys the
|
107
140
|
usual semantics of a File object but it has new methods to manipulate
|
@@ -146,20 +179,21 @@ Files can readily be opened, read, and downloaded from the Internet
|
|
146
179
|
|
147
180
|
== Archives & Compressed Files
|
148
181
|
|
149
|
-
IMW works with a variety of archiving and compression programs
|
150
|
-
|
182
|
+
IMW works with a variety of archiving and compression programs to make
|
183
|
+
packaging/unpackaging data easy.
|
151
184
|
|
152
185
|
bz2 = IMW.open('/path/to/big_file.bz2')
|
153
186
|
zip = IMW.open('/path/to/archive.zip')
|
154
187
|
targz = IMW.open('/path/to/archive.tar.gz')
|
155
188
|
|
156
|
-
|
157
|
-
|
158
|
-
bz2.
|
159
|
-
|
160
|
-
zip.
|
161
|
-
|
162
|
-
targz.
|
189
|
+
IMW recognizes file properties by extension
|
190
|
+
|
191
|
+
bz2.is_archive? # false
|
192
|
+
bz2.is_compressed? # true
|
193
|
+
zip.is_archive? # true
|
194
|
+
zip.is_compressed? # false
|
195
|
+
targz.is_archive? # true
|
196
|
+
targz.is_compressed? # true
|
163
197
|
|
164
198
|
# decompress or compress files
|
165
199
|
big_file = bz2.decompress! # skip the ! to preserve the original
|
@@ -170,53 +204,113 @@ IMW::EXTERNAL_PROGRAMS) to make packaging/unpackaging data easy.
|
|
170
204
|
tarbz2.extract # no need to decompress first
|
171
205
|
new_tarbz2 = IMW.open!('/new/archive.tar').create(['/path1', '/path/2']).compress!
|
172
206
|
|
173
|
-
== Data
|
207
|
+
== Parsing and Emitting Data
|
174
208
|
|
175
|
-
IMW encourages you to work with
|
209
|
+
IMW encourages you to work with native Ruby data structures as much as
|
176
210
|
possible by providing methods to parse common data formats directly
|
177
|
-
into
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
211
|
+
into Arrays, Hashes and Strings.
|
212
|
+
|
213
|
+
Some data formats (CSV, JSON, YAML) have a structure which trivially
|
214
|
+
maps to Arrays, Hashes, and Strings and so these formats can
|
215
|
+
immediately be parsed.
|
216
|
+
|
217
|
+
Other formats (XML, HTML, flat files, &c.) use data structures which
|
218
|
+
do not map as readily to Arrays, Hashes, and Strings and so these will
|
219
|
+
have to be parsed first.
|
220
|
+
|
221
|
+
=== Ruby-like Data Formats
|
222
|
+
|
223
|
+
These include delimited formats such as CSV and TSV as well as
|
224
|
+
"restricted tree-like" formats like JSON and YAML.
|
225
|
+
|
226
|
+
For the case of delimited data, consider the following CSV file:
|
227
|
+
|
228
|
+
ID,Name,Genus,Species
|
229
|
+
001,Gray-bellied Night Monkey,Aotus,lemurinus
|
230
|
+
002,Panamanian Night Monkey,Aotus,zonalis
|
231
|
+
003,Hernández-Camacho's Night Monkey,Aotus,jorgehernandezi
|
232
|
+
004,Gray-handed Night Monkey,Aotus,griseimembra
|
233
|
+
005,Hershkovitz's Night Monkey,Aotus,hershkovitzi
|
234
|
+
006,Brumback's Night Monkey,Aotus,brumbacki
|
235
|
+
007,Three-striped Night Monkey,Aotus,trivirgatus
|
236
|
+
008,Spix's Night Monkey,Aotus,vociferans
|
237
|
+
009,Malaysian Lar Gibbon,Hylobates,lar lar
|
238
|
+
010,Carpenter's Lar Gibbon,Hylobates,lar carpenteri
|
239
|
+
|
240
|
+
It trivially maps to an Array of Arrays:
|
241
|
+
|
242
|
+
data = IMW.open('/path/to/monkeys.csv').load
|
243
|
+
puts data.class
|
244
|
+
=> Array
|
245
|
+
puts data.first.class
|
246
|
+
=> Array
|
247
|
+
data.each { |row| puts row.inspect }
|
248
|
+
=> ["ID", "Name", "Genus", "Species"]
|
249
|
+
["001", "Gray-bellied Night Monkey", "Aotus", "lemurinus"]
|
250
|
+
["002", "Panamanian Night Monkey", "Aotus", "zonalis"]
|
251
|
+
...
|
252
|
+
["010", "Carpenter's Lar Gibbon", "Hylobates", "lar carpenteri"]
|
253
|
+
|
254
|
+
Conversely, any array of arrays trivially maps to a delimited file.
|
255
|
+
Here we write out all rows where the genus is _Hylobates_ to a TSV
|
256
|
+
file:
|
257
|
+
|
258
|
+
hylobates = data.find_all { |row| row[2] == 'Hylobates' }
|
259
|
+
hylobates.dump('/path/to/monkeys.tsv')
|
260
|
+
|
261
|
+
IMW automatically formats the output as TSV and writes it to the
|
262
|
+
specified path.
|
263
|
+
|
264
|
+
Similarly, restricted tree-like formats like JSON and YAML, which map
|
265
|
+
cleanly onto Hashes, Arrays, and Strings, can also be automatically
|
266
|
+
parsed and emitted by IMW.
|
267
|
+
|
268
|
+
Consider a YAML version of the above CSV data:
|
269
|
+
|
270
|
+
- id: 001
|
271
|
+
name: Gray-bellied Night Monkey
|
272
|
+
genus: Aotus
|
273
|
+
species: lemurinus
|
274
|
+
- id: 002
|
275
|
+
name: Panamanian Night Monkey
|
276
|
+
genus: Aotus
|
277
|
+
species: zonalis
|
278
|
+
- id: 003
|
279
|
+
name: Hernández-Camacho's Night Monkey
|
280
|
+
genus: Aotus
|
281
|
+
species: jorgehernandezi
|
282
|
+
...
|
283
|
+
- id: 010
|
284
|
+
name: Carpenter's Lar Gibbon
|
285
|
+
genus: Hylobates
|
286
|
+
species: lar carpenteri
|
287
|
+
|
288
|
+
This trivially maps to an Array of Hashes and so we can perform the
|
289
|
+
exact same filtration for YAML and JSON as we did for CSV and TSV (in
|
290
|
+
a one-liner!):
|
291
|
+
|
292
|
+
data = IMW.open('/path/to/monkeys.yaml').load
|
293
|
+
hylobates = data.map{ |monkey| monkey['genus'] == 'Hylobates' }
|
294
|
+
hylobates.dump('/path/to/monkeys.json')
|
295
|
+
|
296
|
+
Resources in these Ruby-like data formats also extend themselves with
|
297
|
+
Enumerable so goodies like +map+, +find_all+, &c. are available. This
|
298
|
+
enables converting YAML to JSON with a one-liner:
|
299
|
+
|
300
|
+
IMW.open('/path/to/monkeys.yaml').find_all { |monkey| monkey['genus'] == 'Hylobates' }.dump('/path/to/monkeys.json')
|
301
|
+
|
302
|
+
=== Parsing More General Data Formats
|
303
|
+
|
304
|
+
Some data formats are structured but do not map readily to Hashes,
|
305
|
+
Arrays, and Strings (XML, HTML, &c.) while other data formats lack
|
306
|
+
structure or have a peculiar structure (flat files in arbitrary
|
307
|
+
syntax).
|
308
|
+
|
309
|
+
In both these cases the data needs to be parsed before it's usable.
|
310
|
+
For the XML and HTML type data formats, IMW uses Hpricot and the
|
311
|
+
IMW::Parsers::HtmlParser for parsing. For flat files, IMW provides
|
312
|
+
the IMW::Parsers::LineParser and the IMW::Parsers::RegexpParser.
|
313
|
+
|
220
314
|
HTML files, on the other hand, are more complex and typically have to
|
221
315
|
be parsed before being converted to plain Ruby objects:
|
222
316
|
|
@@ -245,17 +339,11 @@ rip::
|
|
245
339
|
from the web, obtain it by querying databases, or use other services
|
246
340
|
like rsync, ftp, &c. to pull it in from another computer.
|
247
341
|
|
248
|
-
extract::
|
249
|
-
|
250
|
-
Ripped data is often compressed or otherwise archived and needs to
|
251
|
-
be extracted. It may also be sliced in many ways (excluding certain
|
252
|
-
years, say) to reduce the volume to only what is required.
|
253
|
-
|
254
342
|
parse::
|
255
343
|
|
256
344
|
Data is parsed into Ruby objects and stored.
|
257
345
|
|
258
|
-
|
346
|
+
fix::
|
259
347
|
|
260
348
|
All the parsed data is combined, reconciled, and further processed
|
261
349
|
into a final form.
|
@@ -268,7 +356,7 @@ package::
|
|
268
356
|
Not all datasets
|
269
357
|
|
270
358
|
|
271
|
-
|
359
|
+
= Datasets
|
272
360
|
|
273
361
|
== Tasks & Dependencies
|
274
362
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.2.
|
1
|
+
0.2.5
|
@@ -11,6 +11,8 @@ module IMW
|
|
11
11
|
# @abstract
|
12
12
|
module Delimited
|
13
13
|
|
14
|
+
include Enumerable
|
15
|
+
|
14
16
|
attr_accessor :delimited_settings
|
15
17
|
|
16
18
|
# Return the data in this delimited resource as an array of
|
@@ -25,11 +27,9 @@ module IMW
|
|
25
27
|
FasterCSV.parse(read, delimited_options, &block)
|
26
28
|
end
|
27
29
|
|
28
|
-
#
|
29
|
-
|
30
|
-
|
31
|
-
def map &block
|
32
|
-
load.map(&block)
|
30
|
+
# Call +block+ with each row in this delimited resource.
|
31
|
+
def each &block
|
32
|
+
load(&block)
|
33
33
|
end
|
34
34
|
|
35
35
|
# Dump an array of arrays into this resource.
|
data/lib/imw/formats/json.rb
CHANGED
@@ -4,37 +4,29 @@ module IMW
|
|
4
4
|
# Defines methods for reading and writing JSON data.
|
5
5
|
module Json
|
6
6
|
|
7
|
+
include Enumerable
|
8
|
+
|
7
9
|
# Return the content of this resource.
|
8
10
|
#
|
9
|
-
# Will
|
10
|
-
#
|
11
|
-
#
|
12
|
-
# - if the outermost JSON data structure is an array, then
|
13
|
-
# yield each element
|
14
|
-
#
|
15
|
-
# - if the outermost JSON data structure is a mapping, then
|
16
|
-
# yield each key, value pair
|
17
|
-
#
|
18
|
-
# - otherwise just yield the structure
|
11
|
+
# Will pass a block to the outermost JSON data structure's each
|
12
|
+
# method.
|
19
13
|
#
|
20
14
|
# @return [Hash, Array, String, Fixnum] whatever the JSON contained
|
21
15
|
def load &block
|
22
16
|
require 'json'
|
23
17
|
json = JSON.parse(read)
|
24
18
|
if block_given?
|
25
|
-
|
26
|
-
when Array
|
27
|
-
json.each { |obj| yield obj }
|
28
|
-
when Hash
|
29
|
-
json.each_pair { |key, value| yield key, value }
|
30
|
-
else
|
31
|
-
yield json
|
32
|
-
end
|
19
|
+
json.each(&block)
|
33
20
|
else
|
34
21
|
json
|
35
22
|
end
|
36
23
|
end
|
37
24
|
|
25
|
+
# Iterate over the elements in the JSON.
|
26
|
+
def each &block
|
27
|
+
load(&block)
|
28
|
+
end
|
29
|
+
|
38
30
|
# Dump the +data+ into this resource. It must be opened for
|
39
31
|
# writing.
|
40
32
|
#
|
data/lib/imw/formats/yaml.rb
CHANGED
@@ -4,37 +4,29 @@ module IMW
|
|
4
4
|
# Provides methods for reading and writing YAML data.
|
5
5
|
module Yaml
|
6
6
|
|
7
|
+
include Enumerable
|
8
|
+
|
7
9
|
# Return the content of this resource.
|
8
10
|
#
|
9
|
-
# Will
|
10
|
-
#
|
11
|
-
#
|
12
|
-
# - if the outermost YAML data structure is an array, then
|
13
|
-
# yield each element
|
14
|
-
#
|
15
|
-
# - if the outermost YAML data structure is a mapping, then
|
16
|
-
# yield each key, value pair
|
17
|
-
#
|
18
|
-
# - otherwise just yield the structure
|
11
|
+
# Will pass a block to the outermost YAML data structure's each
|
12
|
+
# method.
|
19
13
|
#
|
20
14
|
# @return [Hash, Array, String, Fixnum] whatever the YAML contained
|
21
15
|
def load &block
|
22
16
|
require 'yaml'
|
23
|
-
yaml = YAML.load(
|
17
|
+
yaml = YAML.load(io)
|
24
18
|
if block_given?
|
25
|
-
|
26
|
-
when Array
|
27
|
-
yaml.each { |obj| yield obj }
|
28
|
-
when Hash
|
29
|
-
yaml.each_pair { |key, value| yield key, value }
|
30
|
-
else
|
31
|
-
yield yaml
|
32
|
-
end
|
19
|
+
yaml.each(&block)
|
33
20
|
else
|
34
21
|
yaml
|
35
22
|
end
|
36
23
|
end
|
37
24
|
|
25
|
+
# Iterate over the elements in the YAML.
|
26
|
+
def each &block
|
27
|
+
load(&block)
|
28
|
+
end
|
29
|
+
|
38
30
|
# Dump the +data+ into this resource. It must be opened for
|
39
31
|
# writing.
|
40
32
|
#
|
data/lib/imw/resource.rb
CHANGED
@@ -6,6 +6,31 @@ module IMW
|
|
6
6
|
# URI handlers to IMW.
|
7
7
|
USER_DEFINED_HANDLERS = [] unless defined?(USER_DEFINED_HANDLERS)
|
8
8
|
|
9
|
+
# Register a new resource handler which dynamically extends a new
|
10
|
+
# IMW::Resource with the given module +mod+.
|
11
|
+
#
|
12
|
+
# +handler+ must be one of
|
13
|
+
#
|
14
|
+
# 1. Regexp
|
15
|
+
# 2. Proc
|
16
|
+
# 3. +true+
|
17
|
+
#
|
18
|
+
# In case (1), if the regular expression matches the resource's URI
|
19
|
+
# then the module (+mod+) will be used to extend the resource.
|
20
|
+
#
|
21
|
+
# In case (2), if the Proc returns a value other than +false+ or
|
22
|
+
# +nil+ then the module will be used.
|
23
|
+
#
|
24
|
+
# In case (3), the module will be used.
|
25
|
+
#
|
26
|
+
# @param [String, Module] mod
|
27
|
+
# @param [Regexp, Proc, true] handler
|
28
|
+
def self.register_handler mod, handler
|
29
|
+
raise IMW::ArgumentError.new("Module must be either a Module or String") unless mod.is_a?(Module) || mod.is_a?(String)
|
30
|
+
raise IMW::ArgumentError.new("Handler must be either a Regexp, Proc, or true") unless handler.is_a?(Regexp) || handler.is_a?(Proc) || handler == true
|
31
|
+
self::USER_DEFINED_HANDLERS << [mod, handler]
|
32
|
+
end
|
33
|
+
|
9
34
|
# A resource can be anything addressable via a URI. Examples
|
10
35
|
# include local files, remote files, webpages, &c.
|
11
36
|
#
|
@@ -178,6 +203,7 @@ module IMW
|
|
178
203
|
raise IMW::Error.new([message, "No path defined for #{self.inspect} extended by #{resource_modules.join(' ')}"].compact.join(', ')) unless respond_to?(:path)
|
179
204
|
raise IMW::Error.new([message, "No exist? method defined for #{self.inspect} extended by #{resource_modules.join(' ')}"].compact.join(', ')) unless respond_to?(:exist?)
|
180
205
|
raise IMW::PathError.new([message, "#{path} does not exist"].compact.join(', ')) unless exist?
|
206
|
+
self
|
181
207
|
end
|
182
208
|
|
183
209
|
# Open a copy of this resource.
|
data/lib/imw/schemes/local.rb
CHANGED
@@ -65,7 +65,7 @@ module IMW
|
|
65
65
|
def dir
|
66
66
|
IMW.open(dirname)
|
67
67
|
end
|
68
|
-
|
68
|
+
|
69
69
|
end
|
70
70
|
|
71
71
|
# Defines methods for appropriate for a local file.
|
@@ -142,6 +142,29 @@ module IMW
|
|
142
142
|
end
|
143
143
|
io.close unless options[:persist]
|
144
144
|
end
|
145
|
+
|
146
|
+
# Return a summary of properties of this local file.
|
147
|
+
#
|
148
|
+
# Returned properties include
|
149
|
+
# - basename
|
150
|
+
# - size
|
151
|
+
# - extension
|
152
|
+
# - snippet
|
153
|
+
def summary
|
154
|
+
{
|
155
|
+
:basename => basename,
|
156
|
+
:size => size,
|
157
|
+
:extension => extension,
|
158
|
+
:snippet => snippet
|
159
|
+
}
|
160
|
+
end
|
161
|
+
|
162
|
+
# Return a 1024-char snippet from this local file.
|
163
|
+
#
|
164
|
+
# @return [Array<String>]
|
165
|
+
def snippet
|
166
|
+
io.read(1024)
|
167
|
+
end
|
145
168
|
end
|
146
169
|
|
147
170
|
# Defines methods for manipulating the contents of a local
|
@@ -182,13 +205,6 @@ module IMW
|
|
182
205
|
Dir[File.join(path, selector)]
|
183
206
|
end
|
184
207
|
|
185
|
-
# Return a list of all paths directly within this directory.
|
186
|
-
#
|
187
|
-
# @return [Array]
|
188
|
-
def contents
|
189
|
-
self['*']
|
190
|
-
end
|
191
|
-
|
192
208
|
# Does this directory contain +obj+?
|
193
209
|
#
|
194
210
|
# @param [String, IMW::Resource] obj
|
@@ -202,6 +218,13 @@ module IMW
|
|
202
218
|
false
|
203
219
|
end
|
204
220
|
|
221
|
+
# Return a list of all paths directly within this directory.
|
222
|
+
#
|
223
|
+
# @return [Array<String>]
|
224
|
+
def contents
|
225
|
+
self['*']
|
226
|
+
end
|
227
|
+
|
205
228
|
# Return all paths within this directory, recursively.
|
206
229
|
#
|
207
230
|
# @return [Array<String>]
|
@@ -209,11 +232,17 @@ module IMW
|
|
209
232
|
self['**/*']
|
210
233
|
end
|
211
234
|
|
212
|
-
# Return all resources within this directory
|
213
|
-
# converted to IMW::Resource objects.
|
235
|
+
# Return all resources directly within this directory.
|
214
236
|
#
|
215
237
|
# @return [Array<IMW::Resource>]
|
216
238
|
def resources
|
239
|
+
contents.map { |path| IMW.open(path) }
|
240
|
+
end
|
241
|
+
|
242
|
+
# Return all resources within this directory, recursively.
|
243
|
+
#
|
244
|
+
# @return [Array<IMW::Resource>]
|
245
|
+
def all_resources
|
217
246
|
all_contents.map do |path|
|
218
247
|
IMW.open(path) unless File.directory?(path)
|
219
248
|
end.compact
|
@@ -251,6 +280,26 @@ module IMW
|
|
251
280
|
self
|
252
281
|
end
|
253
282
|
|
283
|
+
# Return a hash summarizing this directory with a key
|
284
|
+
# <tt>:contents</tt> containing an array of hashes summarizing
|
285
|
+
# this directories contents.
|
286
|
+
#
|
287
|
+
# The directory summary includes the following information
|
288
|
+
# - basename
|
289
|
+
# - size
|
290
|
+
# - num_files
|
291
|
+
# - contents
|
292
|
+
#
|
293
|
+
# @return [Hash]
|
294
|
+
def summary
|
295
|
+
{
|
296
|
+
:basename => basename,
|
297
|
+
:size => size,
|
298
|
+
:num_files => contents.length,
|
299
|
+
:contents => resources.map { |resource| resource.summary }
|
300
|
+
}
|
301
|
+
end
|
302
|
+
|
254
303
|
end
|
255
304
|
end
|
256
305
|
end
|