file_pool 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,90 @@
1
+ # FilePool
2
+
3
+ FilePool helps to manage a large number of files in a Ruby project. It
4
+ takes care of the storage of files in a balanced directory tree and
5
+ generates unique identifiers for all files. It also comes in handy
6
+ when dealing with only a few files.
7
+
8
+ FilePool does not deal with file meta information. It's only purpose
9
+ is to return a file's location given a file identifier, which was
10
+ generated when the file was added to the pool.
11
+
12
+ The identifiers are strings of UUID Type 4 (random), which are also
13
+ used as file names. The directory tree is a 3 level structure using
14
+ the 3 first hexadecimal digits of a UUID as path. For example:
15
+
16
+ 0/d/6/0d6f8dd9-8deb-4500-bb85-2d0796241963
17
+ 0/c/f/0cfb082a-fd57-490c-978b-e47d5948bc8b
18
+ 6/1/d/61ddfe33-13f3-4f71-9234-5fbbf5c4fc2c
19
+
20
+ FilePool is tested with Ruby 1.8.7 and 1.9.3.
21
+
22
+ ## Installation
23
+
24
+ Add this line to your application's Gemfile:
25
+
26
+ gem 'file_pool'
27
+
28
+ And then execute:
29
+
30
+ $ bundle
31
+
32
+ Or install it yourself as:
33
+
34
+ $ gem install file_pool
35
+
36
+ ## Usage
37
+
38
+ ### Setup
39
+ The root path and optionally a Logger must be defined:
40
+
41
+ FilePool.setup '/var/lib/files'
42
+
43
+ In a Rails project the file pool setup would be placed in an intializer:
44
+
45
+ config/initializers/file_pool.
46
+
47
+ ### Example Usage
48
+
49
+ Adding files (perhaps after completed upload)
50
+
51
+ fid = FilePool.add('/Temp/p348dvhn4')
52
+
53
+ Get location of previously added file
54
+
55
+ path = FilePool.path(fid)
56
+
57
+ Remove a file
58
+
59
+ FilePool.remove(fid)
60
+
61
+ ### Maintenance
62
+
63
+ FilePool has a straight forward way of storing files. It doesn't use
64
+ any form of index. As long as you stick to directory structure
65
+ outlined above you can:
66
+
67
+ * move the entire pool somewhere else
68
+ * split the pool using symbolic links or mount points to remote file systems
69
+ * merge file pools by copying them into one
70
+
71
+ There is no risk of overwriting, because UUID type 4 file names are
72
+ unique. (up to an extremely small collision probability).
73
+
74
+ ### Notes
75
+
76
+ Make sure to store the generated file identifiers safely. There is no
77
+ way of identifying a file again when it's ID is lost. In doubt generate a hash
78
+ value from the file and store it somewhere else.
79
+
80
+ For large files the pool root should be on the same file system as the files
81
+ added to the pool. Then adding a file returns immediately. Otherwise
82
+ files will be copied which may take a significant time.
83
+
84
+ ## Contributing
85
+
86
+ 1. Fork it
87
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
88
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
89
+ 4. Push to the branch (`git push origin my-new-feature`)
90
+ 5. Create new Pull Request
@@ -0,0 +1,3 @@
1
+ module FilePool
2
+ VERSION = "0.2.0"
3
+ end
data/lib/file_pool.rb ADDED
@@ -0,0 +1,253 @@
1
+ # -*- coding: utf-8 -*-
2
+ require 'file_pool/version'
3
+ require 'uuidtools'
4
+ =begin
5
+ <em>Robert Anniés (2012)</em>
6
+
7
+ == Introduction
8
+
9
+ FilePool helps to manage a large number of files in a Ruby project. It
10
+ takes care of the storage of files in a balanced directory tree and
11
+ generates unique identifiers for all files. It also comes in handy
12
+ when delaing with only a few files.
13
+
14
+ FilePool does not deal with file meta information. It's only purpose
15
+ is to return a file's location given a file identifier, which was
16
+ generated when the file was added to the pool.
17
+
18
+ The identifiers are strings of UUID Type 4 (random), which are also
19
+ used as file names. The directory tree is a 3 level structure using
20
+ the 3 first hexadecimal digits of a UUID as path. For example:
21
+
22
+ 0/d/6/0d6f8dd9-8deb-4500-bb85-2d0796241963
23
+ 0/c/f/0cfb082a-fd57-490c-978b-e47d5948bc8b
24
+ 6/1/d/61ddfe33-13f3-4f71-9234-5fbbf5c4fc2c
25
+
26
+ == Examples
27
+
28
+ === Setup
29
+ The root path and optionally a Logger must be defined:
30
+
31
+ FilePool.setup '/var/lib/files'
32
+
33
+ In a Rails project the file pool setup should be placed in an intializer:
34
+
35
+ config/initializers/file_pool.rb
36
+
37
+ === Usage
38
+
39
+ Adding files (perhaps after completed upload)
40
+
41
+ fid = FilePool.add('/Temp/p348dvhn4')
42
+
43
+ Get location of previously added file
44
+
45
+ path = FilePool.path(fid)
46
+
47
+ Remove a file
48
+
49
+ FilePool.remove(fid)
50
+
51
+ == Maintenance
52
+
53
+ FilePool has a straight forward way of storing files. It doesn't use
54
+ any form of index. As long as you stick to directory structure
55
+ outlined above you can:
56
+
57
+ * move the entire pool somewhere else
58
+ * split the pool using symbolic links or mount points to remote file systems
59
+ * merge file pools by copying them into one
60
+
61
+ There is no risk of overwriting, because UUID type 4 file names are
62
+ unique. (up to an extremely small collision probability).
63
+
64
+ == Notes
65
+
66
+ Make sure to store the generated file identifiers safely. There is no
67
+ way of identifying a file again when it's ID is lost. In doubt generate a hash
68
+ value from the file and store it somewhere else.
69
+
70
+ For large files the pool root should be on the same file system as the files
71
+ added to the pool. Then adding a file returns immediately. Otherwise
72
+ files will be copied which may take a significant time.
73
+
74
+ =end
75
+ module FilePool
76
+
77
+ class InvalidFileId < Exception; end
78
+
79
+ #
80
+ # Setup the root directory of the file pool root and specify where
81
+ # to write log messages
82
+ #
83
+ # === Parameters:
84
+ #
85
+ # root (String)::
86
+ # absolute path of the file pool's root directory under which all files will be stored.
87
+ def self.setup root
88
+ @@root = root
89
+ end
90
+
91
+ #
92
+ # Add a file to the file pool.
93
+ #
94
+ # Creates hard-links (ln) when file at +path+ is on same file system as
95
+ # pool, otherwise copies it. When dealing with large files the
96
+ # latter should be avoided, because it takes more time and space.
97
+ #
98
+ # Throws standard file exceptions when unable to store the file. See
99
+ # also FilePool.add to avoid it.
100
+ #
101
+ # === Parameters:
102
+ #
103
+ # path (String)::
104
+ # path of the file to add.
105
+ #
106
+ # === Return Value:
107
+ #
108
+ # :: *String* containing a new unique ID for the file added.
109
+ def self.add! path
110
+ newid = uuid
111
+ target = path newid
112
+
113
+ FileUtils.mkpath(id2dir newid)
114
+ FileUtils.link(path, target)
115
+
116
+ return newid
117
+
118
+ rescue Errno::EXDEV
119
+ FileUtils.copy(path, target)
120
+ return newid
121
+ end
122
+
123
+ #
124
+ # Add a file to the file pool.
125
+ #
126
+ # Same as FilePool.add!, but doesn't throw exceptions.
127
+ #
128
+ # === Parameters:
129
+ #
130
+ # source (String)::
131
+ # path of the file to add.
132
+ #
133
+ # === Return Value:
134
+ #
135
+ # :: *String* containing a new unique ID for the file added.
136
+ # :: +false+ when the file could not be stored.
137
+ def self.add path
138
+ self.add!(path)
139
+
140
+ rescue Exception => ex
141
+ return false
142
+ end
143
+
144
+ #
145
+ # Return the path of a previously added file by its ID.
146
+ #
147
+ # === Parameters:
148
+ #
149
+ # fid (String)::
150
+ # File ID which was generated by a previous #add operation.
151
+ #
152
+ # === Return Value:
153
+ #
154
+ # :: *String*, absolute path of the file in the pool.
155
+ def self.path fid
156
+ raise InvalidFileId unless valid?(fid)
157
+ id2dir(fid) + "/#{fid}"
158
+ end
159
+
160
+ #
161
+ # Remove a previously added file by its ID. Same as FilePool.remove,
162
+ # but throws exceptions on failure.
163
+ #
164
+ # === Parameters:
165
+ #
166
+ # fid (String)::
167
+ # File ID which was generated by a previous #add operation.
168
+ def self.remove! fid
169
+ FileUtils.rm path(fid)
170
+ end
171
+
172
+ #
173
+ # Remove a previously added file by its ID. Same as FilePool.remove!, but
174
+ # doesn't throw exceptions.
175
+ #
176
+ # === Parameters:
177
+ #
178
+ # fid (String)::
179
+ # File ID which was generated by a previous #add operation.
180
+ #
181
+ # === Return Value:
182
+ #
183
+ # :: *Boolean*, +true+ if file was removed successfully, +false+ else
184
+ def self.remove fid
185
+ self.remove! fid
186
+ rescue Exception => ex
187
+ return false
188
+ end
189
+
190
+ #
191
+ # Returns some statistics about the current pool. (It may be slow if
192
+ # the pool contains very many files as it computes them from scratch.)
193
+ #
194
+ # === Return Value
195
+ #
196
+ # :: *Hash* with keys
197
+ # :total_number (Integer)::
198
+ # Number of files in pool
199
+ # :total_size (Integer)::
200
+ # Total number of bytes of all files
201
+ # :median_size (Float)::
202
+ # Median of file sizes (most frequent size)
203
+ # :last_add (Time)::
204
+ # Time and Date of last add operation
205
+
206
+ def self.stat
207
+ all_files = Dir.glob("#{root}/*/*/*/*")
208
+ all_stats = all_files.map{|f| File.stat(f) }
209
+
210
+ {
211
+ :total_size => all_stats.inject(0){|sum,stat| sum+=stat.size},
212
+ :median_size => median(all_stats.map{|stat| stat.size}),
213
+ :file_number => all_files.length,
214
+ :last_add => all_stats.map{|stat| stat.ctime}.max
215
+ }
216
+ end
217
+
218
+
219
+ private
220
+
221
+ def self.root
222
+ @@root rescue raise("FilePool: no root directory defined. Use FilePool#setup.")
223
+ end
224
+
225
+ # path from fid without file name
226
+ def self.id2dir fid
227
+ "#{root}/#{fid[0,1]}/#{fid[1,1]}/#{fid[2,1]}"
228
+ end
229
+
230
+ # return a new UUID type 4 (random) as String
231
+ def self.uuid
232
+ UUIDTools::UUID.random_create.to_s
233
+ end
234
+
235
+ # return +true+ if _uuid_ is a valid UUID type 4
236
+ def self.valid? uuid
237
+ begin
238
+ UUIDTools::UUID.parse(uuid).valid?
239
+ rescue TypeError, ArgumentError
240
+ return false
241
+ end
242
+ end
243
+
244
+ # median file size
245
+ def self.median(sizes)
246
+ arr = sizes
247
+ sortedarr = arr.sort
248
+ medpt1 = arr.length / 2
249
+ medpt2 = (arr.length+1)/2
250
+ (sortedarr[medpt1] + sortedarr[medpt2]).to_f / 2
251
+ end
252
+
253
+ end
@@ -0,0 +1,77 @@
1
+ require 'rubygems'
2
+ require 'shoulda'
3
+ require 'file_pool'
4
+
5
+ class FilePoolTest < Test::Unit::TestCase
6
+
7
+ def setup
8
+ @test_dir = "#{File.dirname(__FILE__)}/files"
9
+ @pool_root = "#{File.dirname(__FILE__)}/fp_root"
10
+ FilePool.setup @pool_root
11
+ end
12
+
13
+ def teardown
14
+ FileUtils.rm_r(Dir.glob @pool_root+"/*")
15
+ end
16
+
17
+ context "File Pool" do
18
+ should "store files" do
19
+ fid = FilePool.add(@test_dir+"/a")
20
+
21
+ assert UUIDTools::UUID.parse(fid).valid?
22
+
23
+ md5_orig = Digest::MD5.hexdigest(File.open(@test_dir+"/a").read)
24
+ md5_pooled = Digest::MD5.hexdigest(File.open(@pool_root + "/#{fid[0,1]}/#{fid[1,1]}/#{fid[2,1]}/#{fid}").read)
25
+
26
+ assert_equal md5_orig, md5_pooled
27
+ end
28
+
29
+ should "return path from stored files" do
30
+
31
+ fidb = FilePool.add(@test_dir+"/b")
32
+ fidc = FilePool.add(@test_dir+"/c")
33
+ fidd = FilePool.add!(@test_dir+"/d")
34
+
35
+ assert_equal "#{@pool_root}/#{fidb[0,1]}/#{fidb[1,1]}/#{fidb[2,1]}/#{fidb}", FilePool.path(fidb)
36
+ assert_equal "#{@pool_root}/#{fidc[0,1]}/#{fidc[1,1]}/#{fidc[2,1]}/#{fidc}", FilePool.path(fidc)
37
+ assert_equal "#{@pool_root}/#{fidd[0,1]}/#{fidd[1,1]}/#{fidd[2,1]}/#{fidd}", FilePool.path(fidd)
38
+
39
+ end
40
+
41
+ should "remove files from pool" do
42
+
43
+ fidb = FilePool.add(@test_dir+"/b")
44
+ fidc = FilePool.add!(@test_dir+"/c")
45
+ fidd = FilePool.add!(@test_dir+"/d")
46
+
47
+ path_c = FilePool.path(fidc)
48
+ FilePool.remove(fidc)
49
+
50
+ assert !File.exist?(path_c)
51
+ assert File.exist?(FilePool.path(fidb))
52
+ assert File.exist?(FilePool.path(fidd))
53
+
54
+ end
55
+
56
+ should "throw excceptions when using add! and remove! on failure" do
57
+ assert_raises(FilePool::InvalidFileId) do
58
+ FilePool.remove!("invalid-id")
59
+ end
60
+
61
+ assert_raises(Errno::ENOENT) do
62
+ FilePool.remove!("61e9b2d1-1738-440d-9b3d-e3c64876f2b0")
63
+ end
64
+
65
+ assert_raises(Errno::ENOENT) do
66
+ FilePool.add!("/not/here/foo.png")
67
+ end
68
+
69
+ end
70
+
71
+ should "not throw exceptions when using add and remove on failure" do
72
+ assert !FilePool.remove("invalid-id")
73
+ assert !FilePool.remove("61e9b2d1-1738-440d-9b3d-e3c64876f2b0")
74
+ assert !FilePool.add("/not/here/foo.png")
75
+ end
76
+ end
77
+ end
metadata ADDED
@@ -0,0 +1,103 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: file_pool
3
+ version: !ruby/object:Gem::Version
4
+ hash: 23
5
+ prerelease:
6
+ segments:
7
+ - 0
8
+ - 2
9
+ - 0
10
+ version: 0.2.0
11
+ platform: ruby
12
+ authors:
13
+ - "robokopp (Robert Anni\xC3\xA9s)"
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2012-09-17 00:00:00 Z
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: shoulda
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ none: false
25
+ requirements:
26
+ - - ">="
27
+ - !ruby/object:Gem::Version
28
+ hash: 3
29
+ segments:
30
+ - 0
31
+ version: "0"
32
+ type: :development
33
+ version_requirements: *id001
34
+ - !ruby/object:Gem::Dependency
35
+ name: uuidtools
36
+ prerelease: false
37
+ requirement: &id002 !ruby/object:Gem::Requirement
38
+ none: false
39
+ requirements:
40
+ - - ~>
41
+ - !ruby/object:Gem::Version
42
+ hash: 15
43
+ segments:
44
+ - 2
45
+ - 1
46
+ - 2
47
+ version: 2.1.2
48
+ type: :runtime
49
+ version_requirements: *id002
50
+ description: |
51
+ FilePool helps to manage a large number of files in a Ruby
52
+ project. It takes care of the storage of files in a balanced directory
53
+ tree and generates unique identifiers for all files.
54
+
55
+ email:
56
+ - robokopp@fernwerk.net
57
+ executables: []
58
+
59
+ extensions: []
60
+
61
+ extra_rdoc_files:
62
+ - README.md
63
+ files:
64
+ - lib/file_pool.rb
65
+ - lib/file_pool/version.rb
66
+ - README.md
67
+ - test/test_file_pool.rb
68
+ homepage: https://github.com/robokopp/file_pool
69
+ licenses: []
70
+
71
+ post_install_message:
72
+ rdoc_options: []
73
+
74
+ require_paths:
75
+ - lib
76
+ required_ruby_version: !ruby/object:Gem::Requirement
77
+ none: false
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ hash: 3
82
+ segments:
83
+ - 0
84
+ version: "0"
85
+ required_rubygems_version: !ruby/object:Gem::Requirement
86
+ none: false
87
+ requirements:
88
+ - - ">="
89
+ - !ruby/object:Gem::Version
90
+ hash: 3
91
+ segments:
92
+ - 0
93
+ version: "0"
94
+ requirements: []
95
+
96
+ rubyforge_project:
97
+ rubygems_version: 1.8.24
98
+ signing_key:
99
+ specification_version: 3
100
+ summary: Manage a large number files in a pool
101
+ test_files:
102
+ - test/test_file_pool.rb
103
+ has_rdoc: