swineherd-fs 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source "http://rubygems.org"
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,188 @@
1
+ Copyright 2011 Infochimps, Inc
2
+
3
+ Apache License Version 2.0, January 2004, http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ Licensed under the Apache License, Version 2.0 (the "License");
179
+ you may not use this file except in compliance with the License.
180
+ You may obtain a copy of the License at
181
+
182
+ http://www.apache.org/licenses/LICENSE-2.0
183
+
184
+ Unless required by applicable law or agreed to in writing, software
185
+ distributed under the License is distributed on an "AS IS" BASIS,
186
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
187
+ See the License for the specific language governing permissions and
188
+ limitations under the License.
data/README.textile ADDED
@@ -0,0 +1,66 @@
1
+ h1. Swineherd-fs
2
+
3
+ * @file@ - Local file system. Only thoroughly tested on Ubuntu Linux.
4
+ * @hdfs@ - Hadoop distributed file system. Uses the Apache Hadoop 0.20 API. Requires JRuby.
5
+ * @s3@ - Amazon Simple Storage System (s3).
6
+ * @ftp@ - FTP (Not yet implemented)
7
+
8
+ All filesystem abstractions implement the following core functions, many taken from the UNIX filesystem:
9
+
10
+ * @mv@
11
+ * @cp@
12
+ * @cp_r@
13
+ * @rm@
14
+ * @rm_r@
15
+ * @open@
16
+ * @exists?@
17
+ * @directory?@
18
+ * @ls@
19
+ * @ls_r@
20
+ * @mkdir_p@
21
+
22
+ Note: Since S3 is just a key-value store, it is difficult to preserve the notion of a directory. Therefore the @mkdir_p@ function has no purpose, as there cannot be empty directories. @mkdir_p@ currently only ensures that the bucket exists. This implies that the @directory?@ test only succeeds if the directory is non-empty, which clashes with the notion on the UNIX filesystem.
23
+
24
+ Additionally, the S3 and HDFS abstractions implement functions for moving files to and from the local filesystem:
25
+
26
+ * @copy_to_local@
27
+ * @copy_from_local@
28
+
29
+ Note: For these methods the destination and source path respectively are assumed to be local, so they do not have to be prefaced by a filescheme.
30
+
31
+ The @Swineherd::Filesystem@ module implements a generic filesystem abstraction using schemed filepaths (hdfs://,s3://,file://).
32
+
33
+ Currently only the following methods are supported for @Swineherd::Filesystem@:
34
+
35
+ * @cp@
36
+ * @exists?@
37
+
38
+ For example, instead of doing the following:<pre><code>hdfs = Swineherd::HadoopFilesystem.new
39
+ localfs = Swineherd::LocalFileSystem.new
40
+ hdfs.copy_to_local('foo/bar/baz.txt', 'foo/bar/baz.txt') unless localfs.exists? 'foo/bar/baz.txt'
41
+ </code></pre>
42
+
43
+ You can do:<pre><code>fs = Swineherd::Filesystem
44
+ fs.cp('hdfs://foo/bar/baz.txt','foo/bar/baz.txt') unless fs.exists?('foo/bar/baz.txt')
45
+ </code></pre>
46
+
47
+ Note: A path without a scheme is treated as a path on the local filesystem, or use the explicit file:// scheme for clarity. The following are equivalent:
48
+
49
+ <pre><code>fs.exists?('foo/bar/baz.txt')
50
+ fs.exists?('file://foo/bar/baz.txt')
51
+ </code></pre>
52
+
53
+ h4. Config
54
+
55
+ * In order to use the @S3Filesystem@, Swineherd requires AWS S3 access credentials.
56
+
57
+ * In @~/swineherd.yaml@ or @/etc/swineherd.yaml@:
58
+
59
+ <pre><code>aws:
60
+ access_key: my_access_key
61
+ secret_key: my_secret_key
62
+ </code></pre>
63
+
64
+ * Or just pass them in when creating the instance:
65
+
66
+ <pre><code>S3 = Swineherd::S3FileSystem.new(:access_key => "my_access_key",:secret_key => "my_secret_key")</code></pre>
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.0.2
@@ -0,0 +1,249 @@
1
+ module Swineherd
2
+
3
+ #
4
+ # Methods for dealing with the Hadoop distributed file system (hdfs). This class
5
+ # requires that you run with JRuby as it makes use of the native Java Hadoop
6
+ # libraries.
7
+ #
8
+ class HadoopFileSystem
9
+
10
+ attr_accessor :conf, :hdfs
11
+
12
+ def initialize *args
13
+ set_hadoop_environment if running_jruby?
14
+
15
+ @conf = Java::org.apache.hadoop.conf.Configuration.new
16
+
17
+ if Swineherd.config[:aws]
18
+ @conf.set("fs.s3.awsAccessKeyId",Swineherd.config[:aws][:access_key])
19
+ @conf.set("fs.s3.awsSecretAccessKey",Swineherd.config[:aws][:secret_key])
20
+
21
+ @conf.set("fs.s3n.awsAccessKeyId",Swineherd.config[:aws][:access_key])
22
+ @conf.set("fs.s3n.awsSecretAccessKey",Swineherd.config[:aws][:secret_key])
23
+ end
24
+
25
+ @hdfs = Java::org.apache.hadoop.fs.FileSystem.get(@conf)
26
+ end
27
+
28
+ def open path, mode="r", &blk
29
+ HadoopFile.new(path,mode,self,&blk)
30
+ end
31
+
32
+ def size path
33
+ ls_r(path).inject(0){|sz,filepath| sz += @hdfs.get_file_status(Path.new(filepath)).get_len}
34
+ end
35
+
36
+ def ls path
37
+ (@hdfs.list_status(Path.new(path)) || []).map{|path| path.get_path.to_s}
38
+ end
39
+
40
+ #list directories recursively, similar to unix 'ls -R'
41
+ def ls_r path
42
+ ls(path).inject([]){|rec_paths,path| rec_paths << path; rec_paths << ls(path) unless file?(path); rec_paths}.flatten
43
+ end
44
+
45
+ def rm path
46
+ begin
47
+ @hdfs.delete(Path.new(path), false)
48
+ rescue java.io.IOException => e
49
+ raise Errno::EISDIR, e.message
50
+ end
51
+ end
52
+
53
+ def rm_r path
54
+ @hdfs.delete(Path.new(path), true)
55
+ end
56
+
57
+ def exists? path
58
+ @hdfs.exists(Path.new(path))
59
+ end
60
+
61
+ def directory? path
62
+ exists?(path) && @hdfs.get_file_status(Path.new(path)).is_dir?
63
+ end
64
+
65
+ def file? path
66
+ exists?(path) && @hdfs.isFile(Path.new(path))
67
+ end
68
+
69
+ def mv srcpath, dstpath
70
+ @hdfs.rename(Path.new(srcpath), Path.new(dstpath))
71
+ end
72
+
73
+ #supports s3://,s3n://,hdfs:// in @srcpath@ and @dstpath@
74
+ def cp srcpath, dstpath
75
+ @src_fs = Java::org.apache.hadoop.fs.FileSystem.get(Java::JavaNet::URI.create(srcpath),@conf)
76
+ @dest_fs = Java::org.apache.hadoop.fs.FileSystem.get(Java::JavaNet::URI.create(dstpath),@conf)
77
+ FileUtil.copy(@src_fs, Path.new(srcpath),@dest_fs, Path.new(dstpath), false, @conf)
78
+ end
79
+
80
+ def cp_r srcpath,dstpath
81
+ cp srcpath,dstpath
82
+ end
83
+
84
+ def mkdir_p path
85
+ @hdfs.mkdirs(Path.new(path))
86
+ end
87
+
88
+ #
89
+ # Copy hdfs file to local filesystem
90
+ #
91
+ def copy_to_local srcfile, dstfile
92
+ @hdfs.copy_to_local_file(Path.new(srcfile), Path.new(dstfile))
93
+ end
94
+ # alias :get :copy_to_local
95
+
96
+ #
97
+ # Copy local file to hdfs filesystem
98
+ #
99
+ def copy_from_local srcfile, dstfile
100
+ @hdfs.copy_from_local_file(Path.new(srcfile), Path.new(dstfile))
101
+ end
102
+ # alias :put :copy_from_local
103
+
104
+
105
+ #
106
+ # Merge all part files in a directory into one file.
107
+ #
108
+ def merge srcdir, dstfile
109
+ FileUtil.copy_merge(@hdfs, Path.new(srcdir), @hdfs, Path.new(dstfile), false, @conf, "")
110
+ end
111
+
112
+ #
113
+ # This is hackety. Use with caution.
114
+ #
115
+ def stream input, output
116
+ input_fs_scheme = (Java::JavaNet::URI.create(input).scheme || "file") + "://"
117
+ output_fs_scheme = (Java::JavaNet::URI.create(output).scheme || "file") + "://"
118
+ system("#{@hadoop_home}/bin/hadoop \\
119
+ jar #{@hadoop_home}/contrib/streaming/hadoop-*streaming*.jar \\
120
+ -D mapred.job.name=\"Stream { #{input_fs_scheme}(#{File.basename(input)}) -> #{output_fs_scheme}(#{File.basename(output)}) }\" \\
121
+ -D mapred.min.split.size=1000000000 \\
122
+ -D mapred.reduce.tasks=0 \\
123
+ -mapper \"/bin/cat\" \\
124
+ -input \"#{input}\" \\
125
+ -output \"#{output}\"")
126
+ end
127
+
128
+ #
129
+ # BZIP
130
+ #
131
+ def bzip input, output
132
+ system("#{@hadoop_home}/bin/hadoop \\
133
+ jar #{@hadoop_home}/contrib/streaming/hadoop-*streaming*.jar \\
134
+ -D mapred.output.compress=true \\
135
+ -D mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec \\
136
+ -D mapred.reduce.tasks=1 \\
137
+ -mapper \"/bin/cat\" \\
138
+ -reducer \"/bin/cat\" \\
139
+ -input \"#{input}\" \\
140
+ -output \"#{output}\"")
141
+ end
142
+
143
+ #
144
+ # Merges many input files into :reduce_tasks amount of output files
145
+ #
146
+ def dist_merge inputs, output, options = {}
147
+ options[:reduce_tasks] ||= 25
148
+ options[:partition_fields] ||= 2
149
+ options[:sort_fields] ||= 2
150
+ options[:field_separator] ||= '/t'
151
+ names = inputs.map{|inp| File.basename(inp)}.join(',')
152
+ cmd = "#{@hadoop_home}/bin/hadoop \\
153
+ jar #{@hadoop_home}/contrib/streaming/hadoop-*streaming*.jar \\
154
+ -D mapred.job.name=\"Swineherd Merge (#{names} -> #{output})\" \\
155
+ -D num.key.fields.for.partition=\"#{options[:partition_fields]}\" \\
156
+ -D stream.num.map.output.key.fields=\"#{options[:sort_fields]}\" \\
157
+ -D mapred.text.key.partitioner.options=\"-k1,#{options[:partition_fields]}\" \\
158
+ -D stream.map.output.field.separator=\"'#{options[:field_separator]}'\" \\
159
+ -D mapred.min.split.size=1000000000 \\
160
+ -D mapred.reduce.tasks=#{options[:reduce_tasks]} \\
161
+ -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \\
162
+ -mapper \"/bin/cat\" \\
163
+ -reducer \"/usr/bin/uniq\" \\
164
+ -input \"#{inputs.join(',')}\" \\
165
+ -output \"#{output}\""
166
+ puts cmd
167
+ system cmd
168
+ end
169
+
170
+ class HadoopFile
171
+ attr_accessor :handle
172
+
173
+ #
174
+ # In order to open input and output streams we must pass around the hadoop fs object itself
175
+ #
176
+ def initialize path, mode, fs, &blk
177
+ raise Errno::EISDIR,"#{path} is a directory" if fs.directory?(path)
178
+ @path = Path.new(path)
179
+ case mode
180
+ when "r"
181
+ @handle = fs.hdfs.open(@path).to_io(&blk)
182
+ when "w"
183
+ @handle = fs.hdfs.create(@path).to_io.to_outputstream
184
+ if block_given?
185
+ yield self
186
+ self.close
187
+ end
188
+ end
189
+ end
190
+
191
+ def path
192
+ @path.toString()
193
+ end
194
+
195
+ def read
196
+ @handle.read
197
+ end
198
+
199
+ def write string
200
+ @handle.write(string.to_java_string.get_bytes)
201
+ end
202
+
203
+ def close
204
+ @handle.close
205
+ end
206
+
207
+ end
208
+
209
+ private
210
+
211
+ # Check that we are running with jruby, check for hadoop home.
212
+ def running_jruby?
213
+ begin
214
+ require 'java'
215
+ rescue LoadError => e
216
+ raise "\nJava not found, are you sure you're running with JRuby?\n" + e.message
217
+ end
218
+ @hadoop_home = ENV['HADOOP_HOME']
219
+ raise "\nHadoop installation not found, try setting $HADOOP_HOME\n" unless @hadoop_home && (File.exist? @hadoop_home)
220
+ true
221
+ end
222
+
223
+ #
224
+ # Place hadoop jars in class path, require appropriate jars, set hadoop conf
225
+ #
226
+
227
+ def set_classpath
228
+ hadoop_conf = (ENV['HADOOP_CONF_DIR'] || File.join(@hadoop_home, 'conf'))
229
+ hadoop_conf += "/" unless hadoop_conf.end_with? "/"
230
+ $CLASSPATH << hadoop_conf unless $CLASSPATH.include?(hadoop_conf)
231
+ end
232
+
233
+ def import_classes
234
+ Dir["#{@hadoop_home}/hadoop*.jar", "#{@hadoop_home}/lib/*.jar"].each{|jar| require jar}
235
+ ['org.apache.hadoop.fs.Path',
236
+ 'org.apache.hadoop.fs.FileUtil',
237
+ 'org.apache.hadoop.mapreduce.lib.input.FileInputFormat',
238
+ 'org.apache.hadoop.mapreduce.lib.output.FileOutputFormat',
239
+ 'org.apache.hadoop.fs.FSDataOutputStream',
240
+ 'org.apache.hadoop.fs.FSDataInputStream'].map{|j_class| java_import(j_class) }
241
+ end
242
+
243
+ def set_hadoop_environment
244
+ set_classpath
245
+ import_classes
246
+ end
247
+
248
+ end
249
+ end
@@ -0,0 +1,81 @@
1
+ module Swineherd
2
+ class LocalFileSystem
3
+ #include Swineherd::BaseFileSystem
4
+
5
+ def initialize *args
6
+ end
7
+
8
+ def open path, mode="r", &blk
9
+ File.open(path,mode,&blk)
10
+ end
11
+
12
+ #Globs for files at @path@, append '**/*' to glob recursively
13
+ def size path
14
+ Dir[path].inject(0){|s,f|s+=File.size(f)}
15
+ end
16
+
17
+ #A leaky abstraction, should be called rm_rf if it calls rm_rf
18
+ def rm_r path
19
+ FileUtils.rm_rf path
20
+ end
21
+
22
+ def rm path
23
+ FileUtils.rm path
24
+ end
25
+
26
+ def exists? path
27
+ File.exists?(path)
28
+ end
29
+
30
+ def directory? path
31
+ File.directory? path
32
+ end
33
+
34
+ def mv srcpath, dstpath
35
+ FileUtils.mv(srcpath,dstpath)
36
+ end
37
+
38
+ def cp srcpath, dstpath
39
+ FileUtils.cp(srcpath,dstpath)
40
+ end
41
+
42
+ def cp_r srcpath, dstpath
43
+ FileUtils.cp_r(srcpath,dstpath)
44
+ end
45
+
46
+ def mkdir_p path
47
+ FileUtils.mkdir_p path
48
+ end
49
+
50
+ #List directory contents,similar to unix `ls`
51
+ #Dir[@path@/*] to return files in immediate directory of @path@
52
+ def ls path
53
+ if exists?(path)
54
+ if !directory?(path)
55
+ [path]
56
+ else
57
+ path += '/' unless path =~ /\/$/
58
+ Dir[path+'*']
59
+ end
60
+ else
61
+ raise Errno::ENOENT, "No such file or directory - #{path}"
62
+ end
63
+ end
64
+
65
+ #Recursively list directory contents
66
+ #Dir[@path@/**/*],similar to unix `ls -R`
67
+ def ls_r path
68
+ if exists?(path)
69
+ if !directory?(path)
70
+ [path]
71
+ else
72
+ path += '/' unless path =~ /\/$/
73
+ Dir[path+'**/*']
74
+ end
75
+ else
76
+ raise Errno::ENOENT, "No such file or directory - #{path}"
77
+ end
78
+ end
79
+
80
+ end
81
+ end
@@ -0,0 +1,311 @@
1
+ module Swineherd
2
+
3
+ #
4
+ # Methods for interacting with Amazon's Simple Store Service (S3).
5
+ #
6
+ class S3FileSystem
7
+
8
+ attr_accessor :s3
9
+
10
+ def initialize options={}
11
+ aws_access_key = options[:aws_access_key] || (Swineherd.config[:aws] && Swineherd.config[:aws][:access_key])
12
+ aws_secret_key = options[:aws_secret_key] || (Swineherd.config[:aws] && Swineherd.config[:aws][:secret_key])
13
+ raise "Missing AWS keys" unless aws_access_key && aws_secret_key
14
+ @s3 = RightAws::S3.new(aws_access_key, aws_secret_key,:logger => Logger.new(nil)) #FIXME: Just wanted it to shut up
15
+ end
16
+
17
+ def open path, mode="r", &blk
18
+ S3File.new(path,mode,self,&blk)
19
+ end
20
+
21
+ def size path
22
+ if directory?(path)
23
+ ls_r(path).inject(0){|sum,file| sum += filesize(file)}
24
+ else
25
+ filesize(path)
26
+ end
27
+ end
28
+
29
+ def rm path
30
+ bkt,key = split_path(path)
31
+ if key.empty? || directory?(path)
32
+ raise Errno::EISDIR,"#{path} is a directory or bucket, use rm_r or rm_bucket"
33
+ else
34
+ @s3.interface.delete(bkt, key)
35
+ end
36
+ end
37
+
38
+ #rm_r - Remove recursively. Does not delete buckets, use rm_bucket
39
+ #params: @path@ - Path of file or folder to delete
40
+ #returns: Array - Array of paths which were deleted
41
+ def rm_r path
42
+ bkt,key = split_path(path)
43
+ if key.empty?
44
+ # only the bucket was passed in
45
+ else
46
+ if directory?(path)
47
+ @s3.interface.delete_folder(bkt,key).flatten
48
+ else
49
+ @s3.interface.delete(bkt, key)
50
+ [path]
51
+ end
52
+ end
53
+ end
54
+
55
+ def rm_bucket bucket_name
56
+ @s3.interface.force_delete_bucket(bucket_name)
57
+ end
58
+
59
+ def exists? path
60
+ bucket,key = split_path(path)
61
+ begin
62
+ if key.empty? #only a bucket was passed in, check if it exists
63
+ #FIXME: there may be a better way to test, relying on error to be raised here
64
+ @s3.interface.bucket_location(bucket) && true
65
+ elsif file?(path) #simply test for existence of the file
66
+ true
67
+ else #treat as directory and see if there are files beneath it
68
+ #if it's not a file, it is harmless to add '/'.
69
+ #the prefix search may return files with the same root extension,
70
+ #ie. foo.txt and foo.txt.bak, if we leave off the trailing slash
71
+ key+="/" unless key =~ /\/$/
72
+ @s3.interface.list_bucket(bucket,:prefix => key).size > 0
73
+ end
74
+ rescue RightAws::AwsError => error
75
+ if error.message =~ /nosuchbucket/i
76
+ false
77
+ elsif error.message =~ /not found/i
78
+ false
79
+ else
80
+ raise
81
+ end
82
+ end
83
+ end
84
+
85
+ def directory? path
86
+ exists?(path) && !file?(path)
87
+ end
88
+
89
+ def file? path
90
+ bucket,key = split_path(path)
91
+ begin
92
+ return false if (key.nil? || key.empty?) #buckets are not files
93
+ #FIXME: there may be a better way to test, relying on error to be raised
94
+ @s3.interface.head(bucket,key) && true
95
+ rescue RightAws::AwsError => error
96
+ if error.message =~ /nosuchbucket/i
97
+ false
98
+ elsif error.message =~ /not found/i
99
+ false
100
+ else
101
+ raise
102
+ end
103
+ end
104
+ end
105
+
106
+ def mv srcpath, dstpath
107
+ src_bucket,src_key_path = split_path(srcpath)
108
+ dst_bucket,dst_key_path = split_path(dstpath)
109
+ mkdir_p(dstpath) unless exists?(dstpath)
110
+ if directory? srcpath
111
+ paths_to_copy = ls_r(srcpath)
112
+ common_dir = common_directory(paths_to_copy)
113
+ paths_to_copy.each do |path|
114
+ bkt,key = split_path(path)
115
+ src_key = key
116
+ dst_key = File.join(dst_key_path, path.gsub(common_dir, ''))
117
+ @s3.interface.move(src_bucket, src_key, dst_bucket, dst_key)
118
+ end
119
+ else
120
+ @s3.interface.move(src_bucket, src_key_path, dst_bucket, dst_key_path)
121
+ end
122
+ end
123
+
124
+ def cp srcpath, dstpath
125
+ src_bucket,src_key_path = split_path(srcpath)
126
+ dst_bucket,dst_key_path = split_path(dstpath)
127
+ mkdir_p(dstpath) unless exists?(dstpath)
128
+ if src_key_path.empty? || directory?(srcpath)
129
+ raise Errno::EISDIR,"#{srcpath} is a directory or bucket, use cp_r"
130
+ else
131
+ @s3.interface.copy(src_bucket, src_key_path, dst_bucket, dst_key_path)
132
+ end
133
+ end
134
+
135
+ # mv is just a special case of cp_r...this is a waste
136
+ def cp_r srcpath, dstpath
137
+ src_bucket,src_key_path = split_path(srcpath)
138
+ dst_bucket,dst_key_path = split_path(dstpath)
139
+ mkdir_p(dstpath) unless exists?(dstpath)
140
+ if directory? srcpath
141
+ paths_to_copy = ls_r(srcpath)
142
+ common_dir = common_directory(paths_to_copy)
143
+ paths_to_copy.each do |path|
144
+ bkt,key = split_path(path)
145
+ src_key = key
146
+ dst_key = File.join(dst_key_path, path.gsub(common_dir, ''))
147
+ @s3.interface.copy(src_bucket, src_key, dst_bucket, dst_key)
148
+ end
149
+ else
150
+ @s3.interface.copy(src_bucket, src_key_path, dst_bucket, dst_key_path)
151
+ end
152
+ end
153
+
154
+ #This is a bit funny, there's actually no need to create a 'path' since
155
+ #s3 is nothing more than a glorified key-value store. When you create a
156
+ #'file' (key) the 'path' will be created for you. All we do here is create
157
+ #the bucket unless it already exists.
158
+ def mkdir_p path
159
+ bkt,key = split_path(path)
160
+ @s3.interface.create_bucket(bkt) unless exists? path
161
+ end
162
+
163
+ def ls path
164
+ if exists?(path)
165
+ bkt,prefix = split_path(path)
166
+ prefix += '/' if directory?(path) && !(prefix =~ /\/$/) && !prefix.empty?
167
+ contents = []
168
+ @s3.interface.incrementally_list_bucket(bkt, {'prefix' => prefix,:delimiter => '/'}) do |res|
169
+ contents += res[:common_prefixes].map{|c| File.join(bkt,c)}
170
+ contents += res[:contents].map{|c| File.join(bkt, c[:key])}
171
+ end
172
+ contents
173
+ else
174
+ raise Errno::ENOENT, "No such file or directory - #{path}"
175
+ end
176
+ end
177
+
178
+ def ls_r path
179
+ if(file?(path))
180
+ [path]
181
+ else
182
+ ls(path).inject([]){|paths,path| paths << path if directory?(path);paths << ls_r(path)}.flatten
183
+ end
184
+ end
185
+
186
+ # FIXME: Not implemented for directories
187
+ # @srcpath@ is assumed to be on the local filesystem
188
+ def copy_from_local srcpath, destpath
189
+ bucket,key = split_path(destpath)
190
+ if File.exists?(srcpath)
191
+ if File.directory?(srcpath)
192
+ raise "NotYetImplemented"
193
+ else
194
+ @s3.interface.put(bucket, key, File.open(srcpath))
195
+ end
196
+ else
197
+ raise Errno::ENOENT, "No such file or directory - #{srcpath}"
198
+ end
199
+ end
200
+ # alias :put :copy_from_local
201
+
202
+ #FIXME: Not implemented for directories
203
+ def copy_to_local srcpath, dstpath
204
+ src_bucket,src_key_path = split_path(srcpath)
205
+ dstfile = File.new(dstpath, 'w')
206
+ @s3.interface.get(src_bucket, src_key_path) do |chunk|
207
+ dstfile.write(chunk)
208
+ end
209
+ dstfile.close
210
+ end
211
+ # alias :get :copy_to_local
212
+
213
+ def bucket path
214
+ #URI.parse(path).path.split('/').reject{|x| x.empty?}.first
215
+ split_path(path).first
216
+ end
217
+
218
+ def key_for path
219
+ #File.join(URI.parse(path).path.split('/').reject{|x| x.empty?}[1..-1])
220
+ split_path(path).last
221
+ end
222
+
223
+ def split_path path
224
+ uri = URI.parse(path)
225
+ base_uri = ""
226
+ base_uri << uri.host if uri.scheme
227
+ base_uri << uri.path
228
+ path = base_uri.split('/').reject{|x| x.empty?}
229
+ [path[0],path[1..-1].join("/")]
230
+ end
231
+
232
+ private
233
+
234
+ # FIXME: This is dense
235
+ def common_directory paths
236
+ dirs = paths.map{|path| path.split('/')}
237
+ min_size = dirs.map{|splits| splits.size}.min
238
+ dirs = dirs.map{|splits| splits[0...min_size]}
239
+ uncommon_idx = dirs.transpose.each_with_index.find{|dirnames, idx| dirnames.uniq.length > 1}.last
240
+ dirs[0][0...uncommon_idx].join('/')
241
+ end
242
+
243
+ def filesize filepath
244
+ bucket,key = split_path(filepath)
245
+ header = @s3.interface.head(bucket, key)
246
+ header['content-length'].to_i
247
+ end
248
+
249
+ class S3File
250
+ attr_accessor :path, :handle, :fs
251
+
252
+ #
253
+ # In order to open input and output streams we must pass around the s3 fs object itself
254
+ #
255
+ def initialize path, mode, fs, &blk
256
+ @fs = fs
257
+ @path = path
258
+ case mode
259
+ when "r" then
260
+ # raise "#{fs.type(path)} is not a readable file - #{path}" unless fs.type(path) == "file"
261
+ when "w" then
262
+ # raise "Path #{path} is a directory." unless (fs.type(path) == "file") || (fs.type(path) == "unknown")
263
+ @handle = Tempfile.new('s3filestream')
264
+ if block_given?
265
+ yield self
266
+ close
267
+ end
268
+ end
269
+ end
270
+
271
+ #
272
+ # Faster than iterating
273
+ #
274
+ def read
275
+ bucket,key = fs.split_path(path)
276
+ fs.s3.interface.get_object(bucket, key)
277
+ end
278
+
279
+ #
280
+ # This is a little hackety. That is, once you call (.each) on the object the full object starts
281
+ # downloading...
282
+ #
283
+ def readline
284
+ bucket,key = fs.split_path(path)
285
+ @handle ||= fs.s3.interface.get_object(bucket, key).each
286
+ begin
287
+ @handle.next
288
+ rescue StopIteration, NoMethodError
289
+ @handle = nil
290
+ raise EOFError.new("end of file reached")
291
+ end
292
+ end
293
+
294
+ def write string
295
+ @handle.write(string)
296
+ end
297
+
298
+ def close
299
+ bucket,key = fs.split_path(path)
300
+ if @handle
301
+ @handle.read
302
+ fs.s3.interface.put(bucket, key, File.open(@handle.path, 'r'))
303
+ @handle.close
304
+ end
305
+ @handle = nil
306
+ end
307
+
308
+ end
309
+
310
+ end
311
+ end
@@ -0,0 +1,91 @@
1
+ require 'configliere' ; Configliere.use(:commandline, :env_var, :define,:config_file)
2
+ require 'logger'
3
+
4
+ require 'fileutils'
5
+ require 'tempfile'
6
+ require 'right_aws'
7
+
8
+ require 'swineherd-fs/localfilesystem'
9
+ require 'swineherd-fs/s3filesystem'
10
+ require 'swineherd-fs/hadoopfilesystem'
11
+
12
+ #Merge in system and user settings
13
+ SYSTEM_CONFIG_PATH = "/etc/swineherd.yaml" unless defined?(SYSTEM_CONFIG_PATH)
14
+ USER_CONFIG_PATH = File.join(ENV['HOME'], '.swineherd.yaml') unless defined?(USER_CONFIG_PATH)
15
+
16
+ module Swineherd
17
+
18
+ def self.config
19
+ return @config if @config
20
+ config = Configliere::Param.new
21
+ config.read SYSTEM_CONFIG_PATH if File.exists? SYSTEM_CONFIG_PATH
22
+ config.read USER_CONFIG_PATH if File.exists? USER_CONFIG_PATH
23
+ @config ||= config
24
+ end
25
+
26
+ def self.logger
27
+ return @log if @log
28
+ @log ||= Logger.new(config[:log_file] || STDOUT)
29
+ @log.formatter = proc { |severity, datetime, progname, msg|
30
+ "[#{severity.upcase}] #{msg}\n"
31
+ }
32
+ @log
33
+ end
34
+
35
+ def self.logger= logger
36
+ @log = logger
37
+ end
38
+
39
+ module FileSystem
40
+
41
+ HDFS_SCHEME_REGEXP = /^hdfs:\/\//
42
+ S3_SCHEME_REGEXP = /^s3n?:\/\//
43
+
44
+ FILESYSTEMS = {
45
+ 'file' => Swineherd::LocalFileSystem,
46
+ 'hdfs' => Swineherd::HadoopFileSystem,
47
+ 's3' => Swineherd::S3FileSystem,
48
+ 's3n' => Swineherd::S3FileSystem
49
+ }
50
+
51
+ # A factory function that returns an instance of the requested class
52
+ def self.get scheme, *args
53
+ begin
54
+ FILESYSTEMS[scheme.to_s].new *args
55
+ rescue NoMethodError => e
56
+ raise "Filesystem with scheme #{scheme} does not exist.\n #{e.message}"
57
+ end
58
+ end
59
+
60
+ def self.exists?(path)
61
+ fs = self.get(scheme_for(path))
62
+ Swineherd.logger.info "#exists? - #{fs.class} for '#{path}'"
63
+ fs.exists?(path)
64
+ end
65
+
66
+ def self.cp(srcpath,destpath)
67
+ src_fs = scheme_for(srcpath)
68
+ dest_fs = scheme_for(destpath)
69
+ Swineherd.logger.info "#cp - #{src_fs} --> #{dest_fs}"
70
+ if(src_fs.eql?(dest_fs))
71
+ self.get(src_fs).cp(srcpath,destpath)
72
+ elsif src_fs.eql?(:file)
73
+ self.get(dest_fs).copy_from_local(srcpath,destpath)
74
+ elsif dest_fs.eql?(:file)
75
+ self.get(src_fs).copy_to_local(srcpath,destpath)
76
+ else #cp between s3/s3n and hdfs can be handled by Hadoop:FileUtil in HadoopFileSystem
77
+ self.get(:hdfs).cp(srcpath,destpath)
78
+ end
79
+ end
80
+
81
+ private
82
+
83
+ #defaults to local filesystem :file
84
+ def self.scheme_for(path)
85
+ scheme = URI.parse(path).scheme
86
+ (scheme && scheme.to_sym) || :file
87
+ end
88
+
89
+ end
90
+
91
+ end
data/rspec.watchr ADDED
@@ -0,0 +1,19 @@
1
+ # -*- ruby -*-
2
+
3
+ def run_spec(file)
4
+ unless File.exist?(file)
5
+ puts "#{file} does not exist"
6
+ return
7
+ end
8
+ puts "Running #{file}"
9
+ system "rspec #{file}"
10
+ end
11
+
12
+ watch("spec/.*/*_spec\.rb") do |match|
13
+ run_spec match[0]
14
+ end
15
+
16
+ watch("lib/swineherd-fs/(.*)\.rb") do |match|
17
+ file = %{spec/#{match[1]}_spec.rb}
18
+ run_spec file if File.exists?(file)
19
+ end
@@ -0,0 +1,186 @@
1
+ require 'spec_helper'
2
+ FS_SPEC_ROOT = File.dirname(__FILE__)
3
+ S3_TEST_BUCKET = 'swineherd-fs-test-bucket' #You'll have to set this to something else
4
+
5
+ shared_examples_for "an abstract filesystem" do
6
+
7
+ let(:test_filename){ File.join(test_dirname,"filename.txt") }
8
+ let(:test_string){ "foobarbaz" }
9
+
10
+ let(:files){ ['d.txt','b/c.txt'].map{|f| File.join(test_dirname,f)} }
11
+ let(:dirs){ %w(b).map{|d| File.join(test_dirname,d)} }
12
+
13
+ it "implements #exists?" do
14
+ fs.mkdir_p(test_dirname)
15
+ expect{ fs.open(test_filename,'w'){|f| f.write(test_string)} }.to change{ fs.exists?(test_filename) }.from(false).to(true)
16
+ end
17
+
18
+ it "implements #directory?" do
19
+ fs.mkdir_p(test_dirname)
20
+ fs.open(test_filename, 'w'){|f| f.write(test_string)}
21
+ fs.directory?(test_filename).should eql false
22
+ fs.directory?(test_dirname).should eql true
23
+ end
24
+
25
+ it "implements #rm on files" do
26
+ fs.mkdir_p(test_dirname)
27
+ fs.open(test_filename, 'w'){|f| f.write(test_string)}
28
+ expect{ fs.rm(test_filename) }.to change{ fs.exists?(test_filename) }.from(true).to(false)
29
+ end
30
+
31
+ it "raises error on #rm of non-empty directory" do
32
+ fs.mkdir_p(test_dirname)
33
+ fs.open(test_filename, 'w'){|f| f.write(test_string)}
34
+ expect{fs.rm(test_dirname)}.to raise_error
35
+ end
36
+
37
+ it "implements #rm_r" do
38
+ fs.mkdir_p(test_dirname)
39
+ fs.open(test_filename,'w'){|f| f.write(test_string)}
40
+ expect{ fs.rm_r(test_dirname) }.to change{ fs.exists?(test_dirname) && fs.exists?(test_filename) }.from(true).to(false)
41
+ end
42
+
43
+ it "implements #ls" do
44
+ dirs.each{ |dir| fs.mkdir_p(dir) }
45
+ files.each{|filename| fs.open(filename,"w"){|f|f.write(test_string) }}
46
+ fs.ls(test_dirname).length.should eql 2
47
+ end
48
+
49
+ it "implements #ls_r" do
50
+ dirs.each{ |dir| fs.mkdir_p(dir) }
51
+ files.each{|filename| fs.open(filename,"w"){|f|f.write(test_string) }}
52
+ fs.ls_r(test_dirname).length.should eql 3
53
+ end
54
+
55
+ it "implements #size" do
56
+ fs.mkdir_p(test_dirname)
57
+ fs.open(test_filename,'w'){|f| f.write(test_string)}
58
+ test_string.length.should eql(fs.size(test_filename))
59
+ end
60
+
61
+ it "implements #mkdir_p" do
62
+ expect{ fs.mkdir_p(test_dirname) }.to change{ fs.directory?(test_dirname) }.from(false).to(true)
63
+ end
64
+
65
+ it "implements #mv" do
66
+ fs.mkdir_p(test_dirname)
67
+ fs.open(test_filename, 'w'){|f| f.write(test_string)}
68
+ filename2 = File.join(test_dirname,"new_file.txt")
69
+ expect{ fs.mv(test_filename, filename2) }.to change{ fs.exists?(filename2) }.from(false).to(true)
70
+ fs.exists?(test_filename).should eql false
71
+ fs.open(filename2,"r").read.should eql test_string
72
+ end
73
+
74
+ it "implements #cp" do
75
+ fs.mkdir_p(test_dirname)
76
+ fs.open(test_filename, 'w'){|f| f.write(test_string)}
77
+ filename2 = File.join(test_dirname,"new_file.txt")
78
+ expect{ fs.cp(test_filename, filename2) }.to change{ fs.exists?(filename2) }.from(false).to(true)
79
+ fs.open(test_filename,"r").read.should eql fs.open(filename2,"r").read
80
+ end
81
+
82
+ it "implements #cp_r"
83
+
84
+ it "implements #open" do
85
+ fs.mkdir_p(test_dirname)
86
+ expect{
87
+ file = fs.open(test_filename, 'w')
88
+ file.write(test_string)
89
+ file.close
90
+ }.to change{ fs.exists?(test_filename) }.from(false).to(true)
91
+ end
92
+
93
+ it "implements #open with &blk" do
94
+ fs.mkdir_p(test_dirname)
95
+ expect{ fs.open(test_filename, 'w'){|f| f.write(test_string)} }.to change{ fs.exists?(test_filename) }.from(false).to(true)
96
+ end
97
+
98
+ describe "with a new file" do
99
+
100
+ it "implements path" do
101
+ fs.mkdir_p(test_dirname)
102
+ file = fs.open(test_filename,'w')
103
+ file.path.should eql test_filename
104
+ end
105
+
106
+ it "implements write" do
107
+ fs.mkdir_p(test_dirname)
108
+ fs.open(test_filename,'w'){|f| f.write(test_string)}
109
+ end
110
+
111
+ it "should not allow write after close" do
112
+ fs.mkdir_p(test_dirname)
113
+ file = fs.open(test_filename,'w')
114
+ file.write(test_string)
115
+ file.close
116
+ lambda{file.write(test_string)}.should raise_error
117
+ end
118
+
119
+ it "implements read" do
120
+ fs.mkdir_p(test_dirname)
121
+ fs.open(test_filename,'w'){|f| f.write(test_string)}
122
+ fs.open(test_filename,'r').read.should eql test_string
123
+ end
124
+
125
+ end
126
+
127
+ after do
128
+ fs.rm_r(test_dirname) if fs.exists?(test_dirname)
129
+ end
130
+
131
+ end
132
+
133
+ describe Swineherd::FileSystem do
134
+ let(:fs){ Swineherd::FileSystem }
135
+ let(:test_dirname){ FS_SPEC_ROOT+"/tmp/test_dir" }
136
+ let(:test_filename){ File.join(test_dirname,"filename.txt") }
137
+ let(:test_string){ "foobarbaz" }
138
+
139
+ it "implements #cp" do
140
+ localfs = Swineherd::LocalFileSystem.new
141
+ s3_fs = Swineherd::S3FileSystem.new
142
+ localfs.mkdir_p(test_dirname)
143
+ localfs.open(test_filename, 'w'){|f| f.write(test_string)}
144
+ s3_filename = "s3://"+S3_TEST_BUCKET+"/new_file.txt"
145
+ expect{ fs.cp(test_filename, s3_filename) }.to change{ fs.exists?(s3_filename) }.from(false).to(true)
146
+ localfs.rm_r(test_dirname) if localfs.exists?(test_dirname)
147
+ s3_fs.rm(s3_filename)
148
+ end
149
+ end
150
+
151
+ describe Swineherd::LocalFileSystem do
152
+
153
+ it_behaves_like "an abstract filesystem" do
154
+ let(:fs){ Swineherd::LocalFileSystem.new }
155
+ let(:test_dirname){ FS_SPEC_ROOT+"/tmp/test_dir" }
156
+ end
157
+
158
+ end
159
+
160
+ describe Swineherd::S3FileSystem do
161
+
162
+ #mkdir_p won't pass because there is no concept of a directory on s3
163
+
164
+ it_behaves_like "an abstract filesystem" do
165
+ let(:fs){ Swineherd::S3FileSystem.new }
166
+ let(:test_dirname){ S3_TEST_BUCKET+"/tmp/test_dir" }
167
+ end
168
+
169
+ describe "an S3FileSystem" do
170
+ let(:fs){ Swineherd::S3FileSystem.new }
171
+
172
+ it "should return false for #file? on a bucket" do
173
+ fs.file?(S3_TEST_BUCKET).should eql false
174
+ end
175
+
176
+ end
177
+ end
178
+
179
+ describe Swineherd::HadoopFileSystem do
180
+
181
+ it_behaves_like "an abstract filesystem" do
182
+ let(:fs){ Swineherd::HadoopFileSystem.new }
183
+ let(:test_dirname){ "/tmp/test_dir" }
184
+ end
185
+
186
+ end
@@ -0,0 +1,2 @@
1
+ $LOAD_PATH << File.expand_path(File.join(File.dirname(__FILE__),'../lib'))
2
+ require 'swineherd-fs'
@@ -0,0 +1,23 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = %q{swineherd-fs}
5
+ s.version = "0.0.2"
6
+ s.authors = ["David Snyder","Jacob Perkins"]
7
+ s.date = %q{2012-01-20}
8
+ s.description = %q{A filesystem abstraction for Amazon S3 and Hadoop HDFS}
9
+ s.summary = %q{A filesystem abstraction for Amazon S3 and Hadoop HDFS}
10
+ s.email = %q{"david@infochimps.com"}
11
+ s.homepage = %q{http://github.com/infochimps-labs/swineherd-fs}
12
+
13
+ s.files = ["LICENSE", "VERSION","Gemfile", "swineherd-fs.gemspec", "rspec.watchr", "README.textile", "lib/swineherd-fs.rb","lib/swineherd-fs/localfilesystem.rb", "lib/swineherd-fs/s3filesystem.rb", "lib/swineherd-fs/hadoopfilesystem.rb", "spec/spec_helper.rb", "spec/filesystem_spec.rb"]
14
+ s.test_files = ["spec/spec_helper.rb", "spec/filesystem_spec.rb"]
15
+ s.require_paths = ["lib"]
16
+
17
+ s.add_development_dependency("rspec")
18
+ s.add_development_dependency("watchr")
19
+ s.add_runtime_dependency(%q<configliere>, [">= 0"])
20
+ s.add_runtime_dependency(%q<right_aws>, [">= 0"])
21
+ s.add_runtime_dependency(%q<jruby-openssl>, [">= 0"])
22
+ end
23
+
metadata ADDED
@@ -0,0 +1,121 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: swineherd-fs
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.0.2
6
+ platform: ruby
7
+ authors:
8
+ - David Snyder
9
+ - Jacob Perkins
10
+ autorequire:
11
+ bindir: bin
12
+ cert_chain: []
13
+
14
+ date: 2012-01-20 00:00:00 Z
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: rspec
18
+ prerelease: false
19
+ requirement: &id001 !ruby/object:Gem::Requirement
20
+ none: false
21
+ requirements:
22
+ - - ">="
23
+ - !ruby/object:Gem::Version
24
+ version: "0"
25
+ type: :development
26
+ version_requirements: *id001
27
+ - !ruby/object:Gem::Dependency
28
+ name: watchr
29
+ prerelease: false
30
+ requirement: &id002 !ruby/object:Gem::Requirement
31
+ none: false
32
+ requirements:
33
+ - - ">="
34
+ - !ruby/object:Gem::Version
35
+ version: "0"
36
+ type: :development
37
+ version_requirements: *id002
38
+ - !ruby/object:Gem::Dependency
39
+ name: configliere
40
+ prerelease: false
41
+ requirement: &id003 !ruby/object:Gem::Requirement
42
+ none: false
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: "0"
47
+ type: :runtime
48
+ version_requirements: *id003
49
+ - !ruby/object:Gem::Dependency
50
+ name: right_aws
51
+ prerelease: false
52
+ requirement: &id004 !ruby/object:Gem::Requirement
53
+ none: false
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: "0"
58
+ type: :runtime
59
+ version_requirements: *id004
60
+ - !ruby/object:Gem::Dependency
61
+ name: jruby-openssl
62
+ prerelease: false
63
+ requirement: &id005 !ruby/object:Gem::Requirement
64
+ none: false
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: "0"
69
+ type: :runtime
70
+ version_requirements: *id005
71
+ description: A filesystem abstraction for Amazon S3 and Hadoop HDFS
72
+ email: "\"david@infochimps.com\""
73
+ executables: []
74
+
75
+ extensions: []
76
+
77
+ extra_rdoc_files: []
78
+
79
+ files:
80
+ - LICENSE
81
+ - VERSION
82
+ - Gemfile
83
+ - swineherd-fs.gemspec
84
+ - rspec.watchr
85
+ - README.textile
86
+ - lib/swineherd-fs.rb
87
+ - lib/swineherd-fs/localfilesystem.rb
88
+ - lib/swineherd-fs/s3filesystem.rb
89
+ - lib/swineherd-fs/hadoopfilesystem.rb
90
+ - spec/spec_helper.rb
91
+ - spec/filesystem_spec.rb
92
+ homepage: http://github.com/infochimps-labs/swineherd-fs
93
+ licenses: []
94
+
95
+ post_install_message:
96
+ rdoc_options: []
97
+
98
+ require_paths:
99
+ - lib
100
+ required_ruby_version: !ruby/object:Gem::Requirement
101
+ none: false
102
+ requirements:
103
+ - - ">="
104
+ - !ruby/object:Gem::Version
105
+ version: "0"
106
+ required_rubygems_version: !ruby/object:Gem::Requirement
107
+ none: false
108
+ requirements:
109
+ - - ">="
110
+ - !ruby/object:Gem::Version
111
+ version: "0"
112
+ requirements: []
113
+
114
+ rubyforge_project:
115
+ rubygems_version: 1.8.15
116
+ signing_key:
117
+ specification_version: 3
118
+ summary: A filesystem abstraction for Amazon S3 and Hadoop HDFS
119
+ test_files:
120
+ - spec/spec_helper.rb
121
+ - spec/filesystem_spec.rb