file_pool 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,90 @@
1
+ # FilePool
2
+
3
+ FilePool helps to manage a large number of files in a Ruby project. It
4
+ takes care of the storage of files in a balanced directory tree and
5
+ generates unique identifiers for all files. It also comes in handy
6
+ when dealing with only a few files.
7
+
8
+ FilePool does not deal with file meta information. It's only purpose
9
+ is to return a file's location given a file identifier, which was
10
+ generated when the file was added to the pool.
11
+
12
+ The identifiers are strings of UUID Type 4 (random), which are also
13
+ used as file names. The directory tree is a 3 level structure using
14
+ the 3 first hexadecimal digits of a UUID as path. For example:
15
+
16
+ 0/d/6/0d6f8dd9-8deb-4500-bb85-2d0796241963
17
+ 0/c/f/0cfb082a-fd57-490c-978b-e47d5948bc8b
18
+ 6/1/d/61ddfe33-13f3-4f71-9234-5fbbf5c4fc2c
19
+
20
+ FilePool is tested with Ruby 1.8.7 and 1.9.3.
21
+
22
+ ## Installation
23
+
24
+ Add this line to your application's Gemfile:
25
+
26
+ gem 'file_pool'
27
+
28
+ And then execute:
29
+
30
+ $ bundle
31
+
32
+ Or install it yourself as:
33
+
34
+ $ gem install file_pool
35
+
36
+ ## Usage
37
+
38
+ ### Setup
39
+ The root path and optionally a Logger must be defined:
40
+
41
+ FilePool.setup '/var/lib/files'
42
+
43
+ In a Rails project the file pool setup would be placed in an intializer:
44
+
45
+ config/initializers/file_pool.
46
+
47
+ ### Example Usage
48
+
49
+ Adding files (perhaps after completed upload)
50
+
51
+ fid = FilePool.add('/Temp/p348dvhn4')
52
+
53
+ Get location of previously added file
54
+
55
+ path = FilePool.path(fid)
56
+
57
+ Remove a file
58
+
59
+ FilePool.remove(fid)
60
+
61
+ ### Maintenance
62
+
63
+ FilePool has a straight forward way of storing files. It doesn't use
64
+ any form of index. As long as you stick to directory structure
65
+ outlined above you can:
66
+
67
+ * move the entire pool somewhere else
68
+ * split the pool using symbolic links or mount points to remote file systems
69
+ * merge file pools by copying them into one
70
+
71
+ There is no risk of overwriting, because UUID type 4 file names are
72
+ unique. (up to an extremely small collision probability).
73
+
74
+ ### Notes
75
+
76
+ Make sure to store the generated file identifiers safely. There is no
77
+ way of identifying a file again when it's ID is lost. In doubt generate a hash
78
+ value from the file and store it somewhere else.
79
+
80
+ For large files the pool root should be on the same file system as the files
81
+ added to the pool. Then adding a file returns immediately. Otherwise
82
+ files will be copied which may take a significant time.
83
+
84
+ ## Contributing
85
+
86
+ 1. Fork it
87
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
88
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
89
+ 4. Push to the branch (`git push origin my-new-feature`)
90
+ 5. Create new Pull Request
@@ -0,0 +1,3 @@
1
+ module FilePool
2
+ VERSION = "0.2.0"
3
+ end
data/lib/file_pool.rb ADDED
@@ -0,0 +1,253 @@
1
+ # -*- coding: utf-8 -*-
2
+ require 'file_pool/version'
3
+ require 'uuidtools'
4
+ =begin
5
+ <em>Robert Anniés (2012)</em>
6
+
7
+ == Introduction
8
+
9
+ FilePool helps to manage a large number of files in a Ruby project. It
10
+ takes care of the storage of files in a balanced directory tree and
11
+ generates unique identifiers for all files. It also comes in handy
12
+ when delaing with only a few files.
13
+
14
+ FilePool does not deal with file meta information. It's only purpose
15
+ is to return a file's location given a file identifier, which was
16
+ generated when the file was added to the pool.
17
+
18
+ The identifiers are strings of UUID Type 4 (random), which are also
19
+ used as file names. The directory tree is a 3 level structure using
20
+ the 3 first hexadecimal digits of a UUID as path. For example:
21
+
22
+ 0/d/6/0d6f8dd9-8deb-4500-bb85-2d0796241963
23
+ 0/c/f/0cfb082a-fd57-490c-978b-e47d5948bc8b
24
+ 6/1/d/61ddfe33-13f3-4f71-9234-5fbbf5c4fc2c
25
+
26
+ == Examples
27
+
28
+ === Setup
29
+ The root path and optionally a Logger must be defined:
30
+
31
+ FilePool.setup '/var/lib/files'
32
+
33
+ In a Rails project the file pool setup should be placed in an intializer:
34
+
35
+ config/initializers/file_pool.rb
36
+
37
+ === Usage
38
+
39
+ Adding files (perhaps after completed upload)
40
+
41
+ fid = FilePool.add('/Temp/p348dvhn4')
42
+
43
+ Get location of previously added file
44
+
45
+ path = FilePool.path(fid)
46
+
47
+ Remove a file
48
+
49
+ FilePool.remove(fid)
50
+
51
+ == Maintenance
52
+
53
+ FilePool has a straight forward way of storing files. It doesn't use
54
+ any form of index. As long as you stick to directory structure
55
+ outlined above you can:
56
+
57
+ * move the entire pool somewhere else
58
+ * split the pool using symbolic links or mount points to remote file systems
59
+ * merge file pools by copying them into one
60
+
61
+ There is no risk of overwriting, because UUID type 4 file names are
62
+ unique. (up to an extremely small collision probability).
63
+
64
+ == Notes
65
+
66
+ Make sure to store the generated file identifiers safely. There is no
67
+ way of identifying a file again when it's ID is lost. In doubt generate a hash
68
+ value from the file and store it somewhere else.
69
+
70
+ For large files the pool root should be on the same file system as the files
71
+ added to the pool. Then adding a file returns immediately. Otherwise
72
+ files will be copied which may take a significant time.
73
+
74
+ =end
75
+ module FilePool
76
+
77
+ class InvalidFileId < Exception; end
78
+
79
+ #
80
+ # Setup the root directory of the file pool root and specify where
81
+ # to write log messages
82
+ #
83
+ # === Parameters:
84
+ #
85
+ # root (String)::
86
+ # absolute path of the file pool's root directory under which all files will be stored.
87
+ def self.setup root
88
+ @@root = root
89
+ end
90
+
91
+ #
92
+ # Add a file to the file pool.
93
+ #
94
+ # Creates hard-links (ln) when file at +path+ is on same file system as
95
+ # pool, otherwise copies it. When dealing with large files the
96
+ # latter should be avoided, because it takes more time and space.
97
+ #
98
+ # Throws standard file exceptions when unable to store the file. See
99
+ # also FilePool.add to avoid it.
100
+ #
101
+ # === Parameters:
102
+ #
103
+ # path (String)::
104
+ # path of the file to add.
105
+ #
106
+ # === Return Value:
107
+ #
108
+ # :: *String* containing a new unique ID for the file added.
109
+ def self.add! path
110
+ newid = uuid
111
+ target = path newid
112
+
113
+ FileUtils.mkpath(id2dir newid)
114
+ FileUtils.link(path, target)
115
+
116
+ return newid
117
+
118
+ rescue Errno::EXDEV
119
+ FileUtils.copy(path, target)
120
+ return newid
121
+ end
122
+
123
+ #
124
+ # Add a file to the file pool.
125
+ #
126
+ # Same as FilePool.add!, but doesn't throw exceptions.
127
+ #
128
+ # === Parameters:
129
+ #
130
+ # source (String)::
131
+ # path of the file to add.
132
+ #
133
+ # === Return Value:
134
+ #
135
+ # :: *String* containing a new unique ID for the file added.
136
+ # :: +false+ when the file could not be stored.
137
+ def self.add path
138
+ self.add!(path)
139
+
140
+ rescue Exception => ex
141
+ return false
142
+ end
143
+
144
+ #
145
+ # Return the path of a previously added file by its ID.
146
+ #
147
+ # === Parameters:
148
+ #
149
+ # fid (String)::
150
+ # File ID which was generated by a previous #add operation.
151
+ #
152
+ # === Return Value:
153
+ #
154
+ # :: *String*, absolute path of the file in the pool.
155
+ def self.path fid
156
+ raise InvalidFileId unless valid?(fid)
157
+ id2dir(fid) + "/#{fid}"
158
+ end
159
+
160
+ #
161
+ # Remove a previously added file by its ID. Same as FilePool.remove,
162
+ # but throws exceptions on failure.
163
+ #
164
+ # === Parameters:
165
+ #
166
+ # fid (String)::
167
+ # File ID which was generated by a previous #add operation.
168
+ def self.remove! fid
169
+ FileUtils.rm path(fid)
170
+ end
171
+
172
+ #
173
+ # Remove a previously added file by its ID. Same as FilePool.remove!, but
174
+ # doesn't throw exceptions.
175
+ #
176
+ # === Parameters:
177
+ #
178
+ # fid (String)::
179
+ # File ID which was generated by a previous #add operation.
180
+ #
181
+ # === Return Value:
182
+ #
183
+ # :: *Boolean*, +true+ if file was removed successfully, +false+ else
184
+ def self.remove fid
185
+ self.remove! fid
186
+ rescue Exception => ex
187
+ return false
188
+ end
189
+
190
+ #
191
+ # Returns some statistics about the current pool. (It may be slow if
192
+ # the pool contains very many files as it computes them from scratch.)
193
+ #
194
+ # === Return Value
195
+ #
196
+ # :: *Hash* with keys
197
+ # :total_number (Integer)::
198
+ # Number of files in pool
199
+ # :total_size (Integer)::
200
+ # Total number of bytes of all files
201
+ # :median_size (Float)::
202
+ # Median of file sizes (most frequent size)
203
+ # :last_add (Time)::
204
+ # Time and Date of last add operation
205
+
206
+ def self.stat
207
+ all_files = Dir.glob("#{root}/*/*/*/*")
208
+ all_stats = all_files.map{|f| File.stat(f) }
209
+
210
+ {
211
+ :total_size => all_stats.inject(0){|sum,stat| sum+=stat.size},
212
+ :median_size => median(all_stats.map{|stat| stat.size}),
213
+ :file_number => all_files.length,
214
+ :last_add => all_stats.map{|stat| stat.ctime}.max
215
+ }
216
+ end
217
+
218
+
219
+ private
220
+
221
+ def self.root
222
+ @@root rescue raise("FilePool: no root directory defined. Use FilePool#setup.")
223
+ end
224
+
225
+ # path from fid without file name
226
+ def self.id2dir fid
227
+ "#{root}/#{fid[0,1]}/#{fid[1,1]}/#{fid[2,1]}"
228
+ end
229
+
230
+ # return a new UUID type 4 (random) as String
231
+ def self.uuid
232
+ UUIDTools::UUID.random_create.to_s
233
+ end
234
+
235
+ # return +true+ if _uuid_ is a valid UUID type 4
236
+ def self.valid? uuid
237
+ begin
238
+ UUIDTools::UUID.parse(uuid).valid?
239
+ rescue TypeError, ArgumentError
240
+ return false
241
+ end
242
+ end
243
+
244
+ # median file size
245
+ def self.median(sizes)
246
+ arr = sizes
247
+ sortedarr = arr.sort
248
+ medpt1 = arr.length / 2
249
+ medpt2 = (arr.length+1)/2
250
+ (sortedarr[medpt1] + sortedarr[medpt2]).to_f / 2
251
+ end
252
+
253
+ end
@@ -0,0 +1,77 @@
1
+ require 'rubygems'
2
+ require 'shoulda'
3
+ require 'file_pool'
4
+
5
+ class FilePoolTest < Test::Unit::TestCase
6
+
7
+ def setup
8
+ @test_dir = "#{File.dirname(__FILE__)}/files"
9
+ @pool_root = "#{File.dirname(__FILE__)}/fp_root"
10
+ FilePool.setup @pool_root
11
+ end
12
+
13
+ def teardown
14
+ FileUtils.rm_r(Dir.glob @pool_root+"/*")
15
+ end
16
+
17
+ context "File Pool" do
18
+ should "store files" do
19
+ fid = FilePool.add(@test_dir+"/a")
20
+
21
+ assert UUIDTools::UUID.parse(fid).valid?
22
+
23
+ md5_orig = Digest::MD5.hexdigest(File.open(@test_dir+"/a").read)
24
+ md5_pooled = Digest::MD5.hexdigest(File.open(@pool_root + "/#{fid[0,1]}/#{fid[1,1]}/#{fid[2,1]}/#{fid}").read)
25
+
26
+ assert_equal md5_orig, md5_pooled
27
+ end
28
+
29
+ should "return path from stored files" do
30
+
31
+ fidb = FilePool.add(@test_dir+"/b")
32
+ fidc = FilePool.add(@test_dir+"/c")
33
+ fidd = FilePool.add!(@test_dir+"/d")
34
+
35
+ assert_equal "#{@pool_root}/#{fidb[0,1]}/#{fidb[1,1]}/#{fidb[2,1]}/#{fidb}", FilePool.path(fidb)
36
+ assert_equal "#{@pool_root}/#{fidc[0,1]}/#{fidc[1,1]}/#{fidc[2,1]}/#{fidc}", FilePool.path(fidc)
37
+ assert_equal "#{@pool_root}/#{fidd[0,1]}/#{fidd[1,1]}/#{fidd[2,1]}/#{fidd}", FilePool.path(fidd)
38
+
39
+ end
40
+
41
+ should "remove files from pool" do
42
+
43
+ fidb = FilePool.add(@test_dir+"/b")
44
+ fidc = FilePool.add!(@test_dir+"/c")
45
+ fidd = FilePool.add!(@test_dir+"/d")
46
+
47
+ path_c = FilePool.path(fidc)
48
+ FilePool.remove(fidc)
49
+
50
+ assert !File.exist?(path_c)
51
+ assert File.exist?(FilePool.path(fidb))
52
+ assert File.exist?(FilePool.path(fidd))
53
+
54
+ end
55
+
56
+ should "throw excceptions when using add! and remove! on failure" do
57
+ assert_raises(FilePool::InvalidFileId) do
58
+ FilePool.remove!("invalid-id")
59
+ end
60
+
61
+ assert_raises(Errno::ENOENT) do
62
+ FilePool.remove!("61e9b2d1-1738-440d-9b3d-e3c64876f2b0")
63
+ end
64
+
65
+ assert_raises(Errno::ENOENT) do
66
+ FilePool.add!("/not/here/foo.png")
67
+ end
68
+
69
+ end
70
+
71
+ should "not throw exceptions when using add and remove on failure" do
72
+ assert !FilePool.remove("invalid-id")
73
+ assert !FilePool.remove("61e9b2d1-1738-440d-9b3d-e3c64876f2b0")
74
+ assert !FilePool.add("/not/here/foo.png")
75
+ end
76
+ end
77
+ end
metadata ADDED
@@ -0,0 +1,103 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: file_pool
3
+ version: !ruby/object:Gem::Version
4
+ hash: 23
5
+ prerelease:
6
+ segments:
7
+ - 0
8
+ - 2
9
+ - 0
10
+ version: 0.2.0
11
+ platform: ruby
12
+ authors:
13
+ - "robokopp (Robert Anni\xC3\xA9s)"
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2012-09-17 00:00:00 Z
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: shoulda
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ none: false
25
+ requirements:
26
+ - - ">="
27
+ - !ruby/object:Gem::Version
28
+ hash: 3
29
+ segments:
30
+ - 0
31
+ version: "0"
32
+ type: :development
33
+ version_requirements: *id001
34
+ - !ruby/object:Gem::Dependency
35
+ name: uuidtools
36
+ prerelease: false
37
+ requirement: &id002 !ruby/object:Gem::Requirement
38
+ none: false
39
+ requirements:
40
+ - - ~>
41
+ - !ruby/object:Gem::Version
42
+ hash: 15
43
+ segments:
44
+ - 2
45
+ - 1
46
+ - 2
47
+ version: 2.1.2
48
+ type: :runtime
49
+ version_requirements: *id002
50
+ description: |
51
+ FilePool helps to manage a large number of files in a Ruby
52
+ project. It takes care of the storage of files in a balanced directory
53
+ tree and generates unique identifiers for all files.
54
+
55
+ email:
56
+ - robokopp@fernwerk.net
57
+ executables: []
58
+
59
+ extensions: []
60
+
61
+ extra_rdoc_files:
62
+ - README.md
63
+ files:
64
+ - lib/file_pool.rb
65
+ - lib/file_pool/version.rb
66
+ - README.md
67
+ - test/test_file_pool.rb
68
+ homepage: https://github.com/robokopp/file_pool
69
+ licenses: []
70
+
71
+ post_install_message:
72
+ rdoc_options: []
73
+
74
+ require_paths:
75
+ - lib
76
+ required_ruby_version: !ruby/object:Gem::Requirement
77
+ none: false
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ hash: 3
82
+ segments:
83
+ - 0
84
+ version: "0"
85
+ required_rubygems_version: !ruby/object:Gem::Requirement
86
+ none: false
87
+ requirements:
88
+ - - ">="
89
+ - !ruby/object:Gem::Version
90
+ hash: 3
91
+ segments:
92
+ - 0
93
+ version: "0"
94
+ requirements: []
95
+
96
+ rubyforge_project:
97
+ rubygems_version: 1.8.24
98
+ signing_key:
99
+ specification_version: 3
100
+ summary: Manage a large number files in a pool
101
+ test_files:
102
+ - test/test_file_pool.rb
103
+ has_rdoc: