imw 0.2.4 → 0.2.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +174 -86
- data/VERSION +1 -1
- data/lib/imw/formats/delimited.rb +5 -5
- data/lib/imw/formats/json.rb +10 -18
- data/lib/imw/formats/yaml.rb +11 -19
- data/lib/imw/resource.rb +26 -0
- data/lib/imw/schemes/local.rb +59 -10
- data/lib/imw/tools/extension_analyzer.rb +108 -0
- data/lib/imw/tools/summarizer.rb +31 -133
- data/lib/imw/utils/log.rb +2 -2
- data/spec/data/sample.json +782 -1
- data/spec/data/sample.yaml +650 -651
- data/spec/imw/formats/delimited_spec.rb +0 -12
- data/spec/imw/formats/json_spec.rb +1 -15
- data/spec/imw/formats/yaml_spec.rb +1 -23
- data/spec/imw/resource_spec.rb +26 -0
- data/spec/imw/schemes/local_spec.rb +1 -1
- metadata +3 -2
data/README.rdoc
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
|
2
|
-
=
|
2
|
+
= What is the Infinite Monkeywrench?
|
3
3
|
|
4
4
|
The Infinite Monkeywrench (IMW) is a Ruby frameworks to simplify the
|
5
5
|
tasks of acquiring, extracting, transforming, loading, and packaging
|
@@ -23,7 +23,7 @@ data. It has the following goals:
|
|
23
23
|
* Let you incorporate your own tools wherever you choose to.
|
24
24
|
|
25
25
|
The Infinite Monkeywrench is a powerful tool but it is not always the
|
26
|
-
right
|
26
|
+
right tool. IMW is **not** designed for
|
27
27
|
|
28
28
|
* Scraping vast amounts of data (use Wuclan[http://github.com/infochimps/wuclan] and Monkeyshines[http://github.com/infochimps/monkeyshines])
|
29
29
|
|
@@ -33,14 +33,14 @@ right one to use. IMW is **not** designed for
|
|
33
33
|
|
34
34
|
* Visualization
|
35
35
|
|
36
|
-
=
|
36
|
+
= Installation
|
37
37
|
|
38
38
|
IMW is hosted on Gemcutter[http://gemcutter.org] so it's easy to install.
|
39
39
|
|
40
|
-
You'll have to
|
40
|
+
You'll have to add <tt>http://gemcutter.org</tt> to your gem sources
|
41
|
+
if it isn't there already:
|
41
42
|
|
42
|
-
$
|
43
|
-
$ gem tumble
|
43
|
+
$ gem sources -a http://gemcutter.org
|
44
44
|
|
45
45
|
and then install IMW
|
46
46
|
|
@@ -59,49 +59,82 @@ _anything_ with a URI and you create one using IMW.open.
|
|
59
59
|
|
60
60
|
csv = IMW.open('/path/to/my_data.csv')
|
61
61
|
html = IMW.open('http://www.infochimps.com')
|
62
|
-
tar_bz2 = IMW.open(
|
63
62
|
|
64
63
|
IMW dynamically extends a resource with modules appropriate to it when
|
65
64
|
you open it. In the above case, +csv+ would be automatically extended
|
66
65
|
by the IMW::Resources::Formats::Csv module, among others:
|
67
66
|
|
68
67
|
csv.resource_modules
|
69
|
-
=> [IMW::
|
68
|
+
=> [IMW::Schemes::Local::Base, IMW::Schemes::Local::LocalFile, IMW::CompressedFiles::Compressible, IMW::Formats::Csv]
|
70
69
|
|
71
70
|
while +html+ will use a different set
|
72
71
|
|
73
72
|
html.resource_modules
|
74
|
-
=> [IMW::
|
75
|
-
|
73
|
+
=> [IMW::Schemes::Remote::Base, IMW::Schemes::Remote::RemoteFile, IMW::Schemes::HTTP, IMW::Formats::Html]
|
76
74
|
|
77
75
|
Consult the documentation for the modules a resource uses to learn
|
78
76
|
what it can do.
|
79
77
|
|
80
|
-
|
78
|
+
== Including/Excluding Resource Modules
|
79
|
+
|
80
|
+
You can exercise finer control of the resource modules IMW will extend
|
81
|
+
a given resource with by passing the <tt>:as</tt> and <tt>:without</tt>.
|
82
|
+
|
83
|
+
IMW.open('http://www.infochimps.com/some_raw_data', :without => [IMW::Formats::Html]).resource_modules
|
84
|
+
=> [IMW::Schemes::Remote::Base, IMW::Schemes::Remote::RemoteFile, IMW::Schemes::HTTP]
|
85
|
+
|
86
|
+
IMW.open('http://www.infochimps.com', :as => [IMW::Formats::Json]).resource_modules
|
87
|
+
=> [IMW::Schemes::Remote::Base, IMW::Schemes::Remote::RemoteFile, IMW::Schemes::HTTP, IMW::Formats::Json]
|
88
|
+
|
89
|
+
You can also pass <tt>:no_modules</tt> to not use any resource
|
90
|
+
modules.
|
91
|
+
|
92
|
+
== Handlers and Custom Resource Modules
|
93
|
+
|
94
|
+
IMW chooses which resource modules to extend an IMW::Resource by
|
95
|
+
iterating through an array of handlers, passing the resource to the
|
96
|
+
handler, and letting the handler's response (true/false) determine
|
97
|
+
whether or not to extend the resource with the module accompanying the
|
98
|
+
handler.
|
99
|
+
|
100
|
+
You can hook into this process by defining your own handlers. To
|
101
|
+
define a handler which should extend with +MyModule+ any resource with
|
102
|
+
a URI ending with <tt>.xxx</tt>
|
81
103
|
|
82
|
-
|
104
|
+
IMW::Resource.register_handler MyModule, /\.xxx$/
|
83
105
|
|
84
|
-
You can
|
106
|
+
You can also use a Proc instead of a Regexp for more control. If the
|
107
|
+
result output of the Proc called with a resource is evaluates true
|
108
|
+
then the resource will be extended by +MyModule+.
|
109
|
+
|
110
|
+
IMW::Resource.register_handler MyModule, Proc.new { |resource| resource.is_local? && resource.path =~ /\.xxx$/ }
|
111
|
+
|
112
|
+
= Manipulating Paths
|
85
113
|
|
86
114
|
IMW holds a registry of paths that you can define on the fly or store
|
87
|
-
in a configuration file.
|
115
|
+
in a configuration file. Defining paths once in the registry and then
|
116
|
+
referring to them forever after by name helps keep your code flexible
|
117
|
+
as well as portable.
|
88
118
|
|
89
|
-
IMW.add_path(:dropbox, "/var/www/public
|
90
|
-
IMW.path_to(:dropbox)
|
119
|
+
IMW.add_path(:dropbox, "/var/www/public")
|
120
|
+
IMW.path_to(:dropbox)
|
121
|
+
=> "/var/www/public"
|
91
122
|
|
92
|
-
You can combine
|
123
|
+
You can combine named references together dynamically.
|
93
124
|
|
94
|
-
IMW.add_path(:raw, "
|
95
|
-
IMW.path_to(:raw
|
96
|
-
|
97
|
-
IMW.path_to(:
|
125
|
+
IMW.add_path(:raw, :dropbox, "raw")
|
126
|
+
IMW.path_to(:raw)
|
127
|
+
=> "/var/www/public/raw"
|
128
|
+
IMW.path_to(:raw, "my/dataset")
|
129
|
+
=> "/var/www/public/raw/my/dataset
|
98
130
|
|
99
131
|
Altering one path will update others
|
100
132
|
|
101
|
-
IMW.add_path(:
|
102
|
-
IMW.path_to(:
|
133
|
+
IMW.add_path(:dropbox, "/data") # redefines :raw
|
134
|
+
IMW.path_to(:raw, "my/dataset)
|
135
|
+
=> "/data/raw/my/dataset" # not /var/www/public/raw/my/dataset
|
103
136
|
|
104
|
-
|
137
|
+
= Files & Directories
|
105
138
|
|
106
139
|
Use IMW.open to open files. The object returned by IMW.open obeys the
|
107
140
|
usual semantics of a File object but it has new methods to manipulate
|
@@ -146,20 +179,21 @@ Files can readily be opened, read, and downloaded from the Internet
|
|
146
179
|
|
147
180
|
== Archives & Compressed Files
|
148
181
|
|
149
|
-
IMW works with a variety of archiving and compression programs
|
150
|
-
|
182
|
+
IMW works with a variety of archiving and compression programs to make
|
183
|
+
packaging/unpackaging data easy.
|
151
184
|
|
152
185
|
bz2 = IMW.open('/path/to/big_file.bz2')
|
153
186
|
zip = IMW.open('/path/to/archive.zip')
|
154
187
|
targz = IMW.open('/path/to/archive.tar.gz')
|
155
188
|
|
156
|
-
|
157
|
-
|
158
|
-
bz2.
|
159
|
-
|
160
|
-
zip.
|
161
|
-
|
162
|
-
targz.
|
189
|
+
IMW recognizes file properties by extension
|
190
|
+
|
191
|
+
bz2.is_archive? # false
|
192
|
+
bz2.is_compressed? # true
|
193
|
+
zip.is_archive? # true
|
194
|
+
zip.is_compressed? # false
|
195
|
+
targz.is_archive? # true
|
196
|
+
targz.is_compressed? # true
|
163
197
|
|
164
198
|
# decompress or compress files
|
165
199
|
big_file = bz2.decompress! # skip the ! to preserve the original
|
@@ -170,53 +204,113 @@ IMW::EXTERNAL_PROGRAMS) to make packaging/unpackaging data easy.
|
|
170
204
|
tarbz2.extract # no need to decompress first
|
171
205
|
new_tarbz2 = IMW.open!('/new/archive.tar').create(['/path1', '/path/2']).compress!
|
172
206
|
|
173
|
-
== Data
|
207
|
+
== Parsing and Emitting Data
|
174
208
|
|
175
|
-
IMW encourages you to work with
|
209
|
+
IMW encourages you to work with native Ruby data structures as much as
|
176
210
|
possible by providing methods to parse common data formats directly
|
177
|
-
into
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
211
|
+
into Arrays, Hashes and Strings.
|
212
|
+
|
213
|
+
Some data formats (CSV, JSON, YAML) have a structure which trivially
|
214
|
+
maps to Arrays, Hashes, and Strings and so these formats can
|
215
|
+
immediately be parsed.
|
216
|
+
|
217
|
+
Other formats (XML, HTML, flat files, &c.) use data structures which
|
218
|
+
do not map as readily to Arrays, Hashes, and Strings and so these will
|
219
|
+
have to be parsed first.
|
220
|
+
|
221
|
+
=== Ruby-like Data Formats
|
222
|
+
|
223
|
+
These include delimited formats such as CSV and TSV as well as
|
224
|
+
"restricted tree-like" formats like JSON and YAML.
|
225
|
+
|
226
|
+
For the case of delimited data, consider the following CSV file:
|
227
|
+
|
228
|
+
ID,Name,Genus,Species
|
229
|
+
001,Gray-bellied Night Monkey,Aotus,lemurinus
|
230
|
+
002,Panamanian Night Monkey,Aotus,zonalis
|
231
|
+
003,Hernández-Camacho's Night Monkey,Aotus,jorgehernandezi
|
232
|
+
004,Gray-handed Night Monkey,Aotus,griseimembra
|
233
|
+
005,Hershkovitz's Night Monkey,Aotus,hershkovitzi
|
234
|
+
006,Brumback's Night Monkey,Aotus,brumbacki
|
235
|
+
007,Three-striped Night Monkey,Aotus,trivirgatus
|
236
|
+
008,Spix's Night Monkey,Aotus,vociferans
|
237
|
+
009,Malaysian Lar Gibbon,Hylobates,lar lar
|
238
|
+
010,Carpenter's Lar Gibbon,Hylobates,lar carpenteri
|
239
|
+
|
240
|
+
It trivially maps to an Array of Arrays:
|
241
|
+
|
242
|
+
data = IMW.open('/path/to/monkeys.csv').load
|
243
|
+
puts data.class
|
244
|
+
=> Array
|
245
|
+
puts data.first.class
|
246
|
+
=> Array
|
247
|
+
data.each { |row| puts row.inspect }
|
248
|
+
=> ["ID", "Name", "Genus", "Species"]
|
249
|
+
["001", "Gray-bellied Night Monkey", "Aotus", "lemurinus"]
|
250
|
+
["002", "Panamanian Night Monkey", "Aotus", "zonalis"]
|
251
|
+
...
|
252
|
+
["010", "Carpenter's Lar Gibbon", "Hylobates", "lar carpenteri"]
|
253
|
+
|
254
|
+
Conversely, any array of arrays trivially maps to a delimited file.
|
255
|
+
Here we write out all rows where the genus is _Hylobates_ to a TSV
|
256
|
+
file:
|
257
|
+
|
258
|
+
hylobates = data.find_all { |row| row[2] == 'Hylobates' }
|
259
|
+
hylobates.dump('/path/to/monkeys.tsv')
|
260
|
+
|
261
|
+
IMW automatically formats the output as TSV and writes it to the
|
262
|
+
specified path.
|
263
|
+
|
264
|
+
Similarly, restricted tree-like formats like JSON and YAML, which map
|
265
|
+
cleanly onto Hashes, Arrays, and Strings, can also be automatically
|
266
|
+
parsed and emitted by IMW.
|
267
|
+
|
268
|
+
Consider a YAML version of the above CSV data:
|
269
|
+
|
270
|
+
- id: 001
|
271
|
+
name: Gray-bellied Night Monkey
|
272
|
+
genus: Aotus
|
273
|
+
species: lemurinus
|
274
|
+
- id: 002
|
275
|
+
name: Panamanian Night Monkey
|
276
|
+
genus: Aotus
|
277
|
+
species: zonalis
|
278
|
+
- id: 003
|
279
|
+
name: Hernández-Camacho's Night Monkey
|
280
|
+
genus: Aotus
|
281
|
+
species: jorgehernandezi
|
282
|
+
...
|
283
|
+
- id: 010
|
284
|
+
name: Carpenter's Lar Gibbon
|
285
|
+
genus: Hylobates
|
286
|
+
species: lar carpenteri
|
287
|
+
|
288
|
+
This trivially maps to an Array of Hashes and so we can perform the
|
289
|
+
exact same filtration for YAML and JSON as we did for CSV and TSV (in
|
290
|
+
a one-liner!):
|
291
|
+
|
292
|
+
data = IMW.open('/path/to/monkeys.yaml').load
|
293
|
+
hylobates = data.map{ |monkey| monkey['genus'] == 'Hylobates' }
|
294
|
+
hylobates.dump('/path/to/monkeys.json')
|
295
|
+
|
296
|
+
Resources in these Ruby-like data formats also extend themselves with
|
297
|
+
Enumerable so goodies like +map+, +find_all+, &c. are available. This
|
298
|
+
enables converting YAML to JSON with a one-liner:
|
299
|
+
|
300
|
+
IMW.open('/path/to/monkeys.yaml').find_all { |monkey| monkey['genus'] == 'Hylobates' }.dump('/path/to/monkeys.json')
|
301
|
+
|
302
|
+
=== Parsing More General Data Formats
|
303
|
+
|
304
|
+
Some data formats are structured but do not map readily to Hashes,
|
305
|
+
Arrays, and Strings (XML, HTML, &c.) while other data formats lack
|
306
|
+
structure or have a peculiar structure (flat files in arbitrary
|
307
|
+
syntax).
|
308
|
+
|
309
|
+
In both these cases the data needs to be parsed before it's usable.
|
310
|
+
For the XML and HTML type data formats, IMW uses Hpricot and the
|
311
|
+
IMW::Parsers::HtmlParser for parsing. For flat files, IMW provides
|
312
|
+
the IMW::Parsers::LineParser and the IMW::Parsers::RegexpParser.
|
313
|
+
|
220
314
|
HTML files, on the other hand, are more complex and typically have to
|
221
315
|
be parsed before being converted to plain Ruby objects:
|
222
316
|
|
@@ -245,17 +339,11 @@ rip::
|
|
245
339
|
from the web, obtain it by querying databases, or use other services
|
246
340
|
like rsync, ftp, &c. to pull it in from another computer.
|
247
341
|
|
248
|
-
extract::
|
249
|
-
|
250
|
-
Ripped data is often compressed or otherwise archived and needs to
|
251
|
-
be extracted. It may also be sliced in many ways (excluding certain
|
252
|
-
years, say) to reduce the volume to only what is required.
|
253
|
-
|
254
342
|
parse::
|
255
343
|
|
256
344
|
Data is parsed into Ruby objects and stored.
|
257
345
|
|
258
|
-
|
346
|
+
fix::
|
259
347
|
|
260
348
|
All the parsed data is combined, reconciled, and further processed
|
261
349
|
into a final form.
|
@@ -268,7 +356,7 @@ package::
|
|
268
356
|
Not all datasets
|
269
357
|
|
270
358
|
|
271
|
-
|
359
|
+
= Datasets
|
272
360
|
|
273
361
|
== Tasks & Dependencies
|
274
362
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.2.
|
1
|
+
0.2.5
|
@@ -11,6 +11,8 @@ module IMW
|
|
11
11
|
# @abstract
|
12
12
|
module Delimited
|
13
13
|
|
14
|
+
include Enumerable
|
15
|
+
|
14
16
|
attr_accessor :delimited_settings
|
15
17
|
|
16
18
|
# Return the data in this delimited resource as an array of
|
@@ -25,11 +27,9 @@ module IMW
|
|
25
27
|
FasterCSV.parse(read, delimited_options, &block)
|
26
28
|
end
|
27
29
|
|
28
|
-
#
|
29
|
-
|
30
|
-
|
31
|
-
def map &block
|
32
|
-
load.map(&block)
|
30
|
+
# Call +block+ with each row in this delimited resource.
|
31
|
+
def each &block
|
32
|
+
load(&block)
|
33
33
|
end
|
34
34
|
|
35
35
|
# Dump an array of arrays into this resource.
|
data/lib/imw/formats/json.rb
CHANGED
@@ -4,37 +4,29 @@ module IMW
|
|
4
4
|
# Defines methods for reading and writing JSON data.
|
5
5
|
module Json
|
6
6
|
|
7
|
+
include Enumerable
|
8
|
+
|
7
9
|
# Return the content of this resource.
|
8
10
|
#
|
9
|
-
# Will
|
10
|
-
#
|
11
|
-
#
|
12
|
-
# - if the outermost JSON data structure is an array, then
|
13
|
-
# yield each element
|
14
|
-
#
|
15
|
-
# - if the outermost JSON data structure is a mapping, then
|
16
|
-
# yield each key, value pair
|
17
|
-
#
|
18
|
-
# - otherwise just yield the structure
|
11
|
+
# Will pass a block to the outermost JSON data structure's each
|
12
|
+
# method.
|
19
13
|
#
|
20
14
|
# @return [Hash, Array, String, Fixnum] whatever the JSON contained
|
21
15
|
def load &block
|
22
16
|
require 'json'
|
23
17
|
json = JSON.parse(read)
|
24
18
|
if block_given?
|
25
|
-
|
26
|
-
when Array
|
27
|
-
json.each { |obj| yield obj }
|
28
|
-
when Hash
|
29
|
-
json.each_pair { |key, value| yield key, value }
|
30
|
-
else
|
31
|
-
yield json
|
32
|
-
end
|
19
|
+
json.each(&block)
|
33
20
|
else
|
34
21
|
json
|
35
22
|
end
|
36
23
|
end
|
37
24
|
|
25
|
+
# Iterate over the elements in the JSON.
|
26
|
+
def each &block
|
27
|
+
load(&block)
|
28
|
+
end
|
29
|
+
|
38
30
|
# Dump the +data+ into this resource. It must be opened for
|
39
31
|
# writing.
|
40
32
|
#
|
data/lib/imw/formats/yaml.rb
CHANGED
@@ -4,37 +4,29 @@ module IMW
|
|
4
4
|
# Provides methods for reading and writing YAML data.
|
5
5
|
module Yaml
|
6
6
|
|
7
|
+
include Enumerable
|
8
|
+
|
7
9
|
# Return the content of this resource.
|
8
10
|
#
|
9
|
-
# Will
|
10
|
-
#
|
11
|
-
#
|
12
|
-
# - if the outermost YAML data structure is an array, then
|
13
|
-
# yield each element
|
14
|
-
#
|
15
|
-
# - if the outermost YAML data structure is a mapping, then
|
16
|
-
# yield each key, value pair
|
17
|
-
#
|
18
|
-
# - otherwise just yield the structure
|
11
|
+
# Will pass a block to the outermost YAML data structure's each
|
12
|
+
# method.
|
19
13
|
#
|
20
14
|
# @return [Hash, Array, String, Fixnum] whatever the YAML contained
|
21
15
|
def load &block
|
22
16
|
require 'yaml'
|
23
|
-
yaml = YAML.load(
|
17
|
+
yaml = YAML.load(io)
|
24
18
|
if block_given?
|
25
|
-
|
26
|
-
when Array
|
27
|
-
yaml.each { |obj| yield obj }
|
28
|
-
when Hash
|
29
|
-
yaml.each_pair { |key, value| yield key, value }
|
30
|
-
else
|
31
|
-
yield yaml
|
32
|
-
end
|
19
|
+
yaml.each(&block)
|
33
20
|
else
|
34
21
|
yaml
|
35
22
|
end
|
36
23
|
end
|
37
24
|
|
25
|
+
# Iterate over the elements in the YAML.
|
26
|
+
def each &block
|
27
|
+
load(&block)
|
28
|
+
end
|
29
|
+
|
38
30
|
# Dump the +data+ into this resource. It must be opened for
|
39
31
|
# writing.
|
40
32
|
#
|
data/lib/imw/resource.rb
CHANGED
@@ -6,6 +6,31 @@ module IMW
|
|
6
6
|
# URI handlers to IMW.
|
7
7
|
USER_DEFINED_HANDLERS = [] unless defined?(USER_DEFINED_HANDLERS)
|
8
8
|
|
9
|
+
# Register a new resource handler which dynamically extends a new
|
10
|
+
# IMW::Resource with the given module +mod+.
|
11
|
+
#
|
12
|
+
# +handler+ must be one of
|
13
|
+
#
|
14
|
+
# 1. Regexp
|
15
|
+
# 2. Proc
|
16
|
+
# 3. +true+
|
17
|
+
#
|
18
|
+
# In case (1), if the regular expression matches the resource's URI
|
19
|
+
# then the module (+mod+) will be used to extend the resource.
|
20
|
+
#
|
21
|
+
# In case (2), if the Proc returns a value other than +false+ or
|
22
|
+
# +nil+ then the module will be used.
|
23
|
+
#
|
24
|
+
# In case (3), the module will be used.
|
25
|
+
#
|
26
|
+
# @param [String, Module] mod
|
27
|
+
# @param [Regexp, Proc, true] handler
|
28
|
+
def self.register_handler mod, handler
|
29
|
+
raise IMW::ArgumentError.new("Module must be either a Module or String") unless mod.is_a?(Module) || mod.is_a?(String)
|
30
|
+
raise IMW::ArgumentError.new("Handler must be either a Regexp, Proc, or true") unless handler.is_a?(Regexp) || handler.is_a?(Proc) || handler == true
|
31
|
+
self::USER_DEFINED_HANDLERS << [mod, handler]
|
32
|
+
end
|
33
|
+
|
9
34
|
# A resource can be anything addressable via a URI. Examples
|
10
35
|
# include local files, remote files, webpages, &c.
|
11
36
|
#
|
@@ -178,6 +203,7 @@ module IMW
|
|
178
203
|
raise IMW::Error.new([message, "No path defined for #{self.inspect} extended by #{resource_modules.join(' ')}"].compact.join(', ')) unless respond_to?(:path)
|
179
204
|
raise IMW::Error.new([message, "No exist? method defined for #{self.inspect} extended by #{resource_modules.join(' ')}"].compact.join(', ')) unless respond_to?(:exist?)
|
180
205
|
raise IMW::PathError.new([message, "#{path} does not exist"].compact.join(', ')) unless exist?
|
206
|
+
self
|
181
207
|
end
|
182
208
|
|
183
209
|
# Open a copy of this resource.
|
data/lib/imw/schemes/local.rb
CHANGED
@@ -65,7 +65,7 @@ module IMW
|
|
65
65
|
def dir
|
66
66
|
IMW.open(dirname)
|
67
67
|
end
|
68
|
-
|
68
|
+
|
69
69
|
end
|
70
70
|
|
71
71
|
# Defines methods for appropriate for a local file.
|
@@ -142,6 +142,29 @@ module IMW
|
|
142
142
|
end
|
143
143
|
io.close unless options[:persist]
|
144
144
|
end
|
145
|
+
|
146
|
+
# Return a summary of properties of this local file.
|
147
|
+
#
|
148
|
+
# Returned properties include
|
149
|
+
# - basename
|
150
|
+
# - size
|
151
|
+
# - extension
|
152
|
+
# - snippet
|
153
|
+
def summary
|
154
|
+
{
|
155
|
+
:basename => basename,
|
156
|
+
:size => size,
|
157
|
+
:extension => extension,
|
158
|
+
:snippet => snippet
|
159
|
+
}
|
160
|
+
end
|
161
|
+
|
162
|
+
# Return a 1024-char snippet from this local file.
|
163
|
+
#
|
164
|
+
# @return [Array<String>]
|
165
|
+
def snippet
|
166
|
+
io.read(1024)
|
167
|
+
end
|
145
168
|
end
|
146
169
|
|
147
170
|
# Defines methods for manipulating the contents of a local
|
@@ -182,13 +205,6 @@ module IMW
|
|
182
205
|
Dir[File.join(path, selector)]
|
183
206
|
end
|
184
207
|
|
185
|
-
# Return a list of all paths directly within this directory.
|
186
|
-
#
|
187
|
-
# @return [Array]
|
188
|
-
def contents
|
189
|
-
self['*']
|
190
|
-
end
|
191
|
-
|
192
208
|
# Does this directory contain +obj+?
|
193
209
|
#
|
194
210
|
# @param [String, IMW::Resource] obj
|
@@ -202,6 +218,13 @@ module IMW
|
|
202
218
|
false
|
203
219
|
end
|
204
220
|
|
221
|
+
# Return a list of all paths directly within this directory.
|
222
|
+
#
|
223
|
+
# @return [Array<String>]
|
224
|
+
def contents
|
225
|
+
self['*']
|
226
|
+
end
|
227
|
+
|
205
228
|
# Return all paths within this directory, recursively.
|
206
229
|
#
|
207
230
|
# @return [Array<String>]
|
@@ -209,11 +232,17 @@ module IMW
|
|
209
232
|
self['**/*']
|
210
233
|
end
|
211
234
|
|
212
|
-
# Return all resources within this directory
|
213
|
-
# converted to IMW::Resource objects.
|
235
|
+
# Return all resources directly within this directory.
|
214
236
|
#
|
215
237
|
# @return [Array<IMW::Resource>]
|
216
238
|
def resources
|
239
|
+
contents.map { |path| IMW.open(path) }
|
240
|
+
end
|
241
|
+
|
242
|
+
# Return all resources within this directory, recursively.
|
243
|
+
#
|
244
|
+
# @return [Array<IMW::Resource>]
|
245
|
+
def all_resources
|
217
246
|
all_contents.map do |path|
|
218
247
|
IMW.open(path) unless File.directory?(path)
|
219
248
|
end.compact
|
@@ -251,6 +280,26 @@ module IMW
|
|
251
280
|
self
|
252
281
|
end
|
253
282
|
|
283
|
+
# Return a hash summarizing this directory with a key
|
284
|
+
# <tt>:contents</tt> containing an array of hashes summarizing
|
285
|
+
# this directories contents.
|
286
|
+
#
|
287
|
+
# The directory summary includes the following information
|
288
|
+
# - basename
|
289
|
+
# - size
|
290
|
+
# - num_files
|
291
|
+
# - contents
|
292
|
+
#
|
293
|
+
# @return [Hash]
|
294
|
+
def summary
|
295
|
+
{
|
296
|
+
:basename => basename,
|
297
|
+
:size => size,
|
298
|
+
:num_files => contents.length,
|
299
|
+
:contents => resources.map { |resource| resource.summary }
|
300
|
+
}
|
301
|
+
end
|
302
|
+
|
254
303
|
end
|
255
304
|
end
|
256
305
|
end
|