file_pool 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +90 -0
- data/lib/file_pool/version.rb +3 -0
- data/lib/file_pool.rb +253 -0
- data/test/test_file_pool.rb +77 -0
- metadata +103 -0
data/README.md
ADDED
@@ -0,0 +1,90 @@
|
|
1
|
+
# FilePool
|
2
|
+
|
3
|
+
FilePool helps to manage a large number of files in a Ruby project. It
|
4
|
+
takes care of the storage of files in a balanced directory tree and
|
5
|
+
generates unique identifiers for all files. It also comes in handy
|
6
|
+
when dealing with only a few files.
|
7
|
+
|
8
|
+
FilePool does not deal with file meta information. It's only purpose
|
9
|
+
is to return a file's location given a file identifier, which was
|
10
|
+
generated when the file was added to the pool.
|
11
|
+
|
12
|
+
The identifiers are strings of UUID Type 4 (random), which are also
|
13
|
+
used as file names. The directory tree is a 3 level structure using
|
14
|
+
the 3 first hexadecimal digits of a UUID as path. For example:
|
15
|
+
|
16
|
+
0/d/6/0d6f8dd9-8deb-4500-bb85-2d0796241963
|
17
|
+
0/c/f/0cfb082a-fd57-490c-978b-e47d5948bc8b
|
18
|
+
6/1/d/61ddfe33-13f3-4f71-9234-5fbbf5c4fc2c
|
19
|
+
|
20
|
+
FilePool is tested with Ruby 1.8.7 and 1.9.3.
|
21
|
+
|
22
|
+
## Installation
|
23
|
+
|
24
|
+
Add this line to your application's Gemfile:
|
25
|
+
|
26
|
+
gem 'file_pool'
|
27
|
+
|
28
|
+
And then execute:
|
29
|
+
|
30
|
+
$ bundle
|
31
|
+
|
32
|
+
Or install it yourself as:
|
33
|
+
|
34
|
+
$ gem install file_pool
|
35
|
+
|
36
|
+
## Usage
|
37
|
+
|
38
|
+
### Setup
|
39
|
+
The root path and optionally a Logger must be defined:
|
40
|
+
|
41
|
+
FilePool.setup '/var/lib/files'
|
42
|
+
|
43
|
+
In a Rails project the file pool setup would be placed in an intializer:
|
44
|
+
|
45
|
+
config/initializers/file_pool.
|
46
|
+
|
47
|
+
### Example Usage
|
48
|
+
|
49
|
+
Adding files (perhaps after completed upload)
|
50
|
+
|
51
|
+
fid = FilePool.add('/Temp/p348dvhn4')
|
52
|
+
|
53
|
+
Get location of previously added file
|
54
|
+
|
55
|
+
path = FilePool.path(fid)
|
56
|
+
|
57
|
+
Remove a file
|
58
|
+
|
59
|
+
FilePool.remove(fid)
|
60
|
+
|
61
|
+
### Maintenance
|
62
|
+
|
63
|
+
FilePool has a straight forward way of storing files. It doesn't use
|
64
|
+
any form of index. As long as you stick to directory structure
|
65
|
+
outlined above you can:
|
66
|
+
|
67
|
+
* move the entire pool somewhere else
|
68
|
+
* split the pool using symbolic links or mount points to remote file systems
|
69
|
+
* merge file pools by copying them into one
|
70
|
+
|
71
|
+
There is no risk of overwriting, because UUID type 4 file names are
|
72
|
+
unique. (up to an extremely small collision probability).
|
73
|
+
|
74
|
+
### Notes
|
75
|
+
|
76
|
+
Make sure to store the generated file identifiers safely. There is no
|
77
|
+
way of identifying a file again when it's ID is lost. In doubt generate a hash
|
78
|
+
value from the file and store it somewhere else.
|
79
|
+
|
80
|
+
For large files the pool root should be on the same file system as the files
|
81
|
+
added to the pool. Then adding a file returns immediately. Otherwise
|
82
|
+
files will be copied which may take a significant time.
|
83
|
+
|
84
|
+
## Contributing
|
85
|
+
|
86
|
+
1. Fork it
|
87
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
88
|
+
3. Commit your changes (`git commit -am 'Added some feature'`)
|
89
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
90
|
+
5. Create new Pull Request
|
data/lib/file_pool.rb
ADDED
@@ -0,0 +1,253 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
2
|
+
require 'file_pool/version'
|
3
|
+
require 'uuidtools'
|
4
|
+
=begin
|
5
|
+
<em>Robert Anniés (2012)</em>
|
6
|
+
|
7
|
+
== Introduction
|
8
|
+
|
9
|
+
FilePool helps to manage a large number of files in a Ruby project. It
|
10
|
+
takes care of the storage of files in a balanced directory tree and
|
11
|
+
generates unique identifiers for all files. It also comes in handy
|
12
|
+
when delaing with only a few files.
|
13
|
+
|
14
|
+
FilePool does not deal with file meta information. It's only purpose
|
15
|
+
is to return a file's location given a file identifier, which was
|
16
|
+
generated when the file was added to the pool.
|
17
|
+
|
18
|
+
The identifiers are strings of UUID Type 4 (random), which are also
|
19
|
+
used as file names. The directory tree is a 3 level structure using
|
20
|
+
the 3 first hexadecimal digits of a UUID as path. For example:
|
21
|
+
|
22
|
+
0/d/6/0d6f8dd9-8deb-4500-bb85-2d0796241963
|
23
|
+
0/c/f/0cfb082a-fd57-490c-978b-e47d5948bc8b
|
24
|
+
6/1/d/61ddfe33-13f3-4f71-9234-5fbbf5c4fc2c
|
25
|
+
|
26
|
+
== Examples
|
27
|
+
|
28
|
+
=== Setup
|
29
|
+
The root path and optionally a Logger must be defined:
|
30
|
+
|
31
|
+
FilePool.setup '/var/lib/files'
|
32
|
+
|
33
|
+
In a Rails project the file pool setup should be placed in an intializer:
|
34
|
+
|
35
|
+
config/initializers/file_pool.rb
|
36
|
+
|
37
|
+
=== Usage
|
38
|
+
|
39
|
+
Adding files (perhaps after completed upload)
|
40
|
+
|
41
|
+
fid = FilePool.add('/Temp/p348dvhn4')
|
42
|
+
|
43
|
+
Get location of previously added file
|
44
|
+
|
45
|
+
path = FilePool.path(fid)
|
46
|
+
|
47
|
+
Remove a file
|
48
|
+
|
49
|
+
FilePool.remove(fid)
|
50
|
+
|
51
|
+
== Maintenance
|
52
|
+
|
53
|
+
FilePool has a straight forward way of storing files. It doesn't use
|
54
|
+
any form of index. As long as you stick to directory structure
|
55
|
+
outlined above you can:
|
56
|
+
|
57
|
+
* move the entire pool somewhere else
|
58
|
+
* split the pool using symbolic links or mount points to remote file systems
|
59
|
+
* merge file pools by copying them into one
|
60
|
+
|
61
|
+
There is no risk of overwriting, because UUID type 4 file names are
|
62
|
+
unique. (up to an extremely small collision probability).
|
63
|
+
|
64
|
+
== Notes
|
65
|
+
|
66
|
+
Make sure to store the generated file identifiers safely. There is no
|
67
|
+
way of identifying a file again when it's ID is lost. In doubt generate a hash
|
68
|
+
value from the file and store it somewhere else.
|
69
|
+
|
70
|
+
For large files the pool root should be on the same file system as the files
|
71
|
+
added to the pool. Then adding a file returns immediately. Otherwise
|
72
|
+
files will be copied which may take a significant time.
|
73
|
+
|
74
|
+
=end
|
75
|
+
module FilePool
|
76
|
+
|
77
|
+
class InvalidFileId < Exception; end
|
78
|
+
|
79
|
+
#
|
80
|
+
# Setup the root directory of the file pool root and specify where
|
81
|
+
# to write log messages
|
82
|
+
#
|
83
|
+
# === Parameters:
|
84
|
+
#
|
85
|
+
# root (String)::
|
86
|
+
# absolute path of the file pool's root directory under which all files will be stored.
|
87
|
+
def self.setup root
|
88
|
+
@@root = root
|
89
|
+
end
|
90
|
+
|
91
|
+
#
|
92
|
+
# Add a file to the file pool.
|
93
|
+
#
|
94
|
+
# Creates hard-links (ln) when file at +path+ is on same file system as
|
95
|
+
# pool, otherwise copies it. When dealing with large files the
|
96
|
+
# latter should be avoided, because it takes more time and space.
|
97
|
+
#
|
98
|
+
# Throws standard file exceptions when unable to store the file. See
|
99
|
+
# also FilePool.add to avoid it.
|
100
|
+
#
|
101
|
+
# === Parameters:
|
102
|
+
#
|
103
|
+
# path (String)::
|
104
|
+
# path of the file to add.
|
105
|
+
#
|
106
|
+
# === Return Value:
|
107
|
+
#
|
108
|
+
# :: *String* containing a new unique ID for the file added.
|
109
|
+
def self.add! path
|
110
|
+
newid = uuid
|
111
|
+
target = path newid
|
112
|
+
|
113
|
+
FileUtils.mkpath(id2dir newid)
|
114
|
+
FileUtils.link(path, target)
|
115
|
+
|
116
|
+
return newid
|
117
|
+
|
118
|
+
rescue Errno::EXDEV
|
119
|
+
FileUtils.copy(path, target)
|
120
|
+
return newid
|
121
|
+
end
|
122
|
+
|
123
|
+
#
|
124
|
+
# Add a file to the file pool.
|
125
|
+
#
|
126
|
+
# Same as FilePool.add!, but doesn't throw exceptions.
|
127
|
+
#
|
128
|
+
# === Parameters:
|
129
|
+
#
|
130
|
+
# source (String)::
|
131
|
+
# path of the file to add.
|
132
|
+
#
|
133
|
+
# === Return Value:
|
134
|
+
#
|
135
|
+
# :: *String* containing a new unique ID for the file added.
|
136
|
+
# :: +false+ when the file could not be stored.
|
137
|
+
def self.add path
|
138
|
+
self.add!(path)
|
139
|
+
|
140
|
+
rescue Exception => ex
|
141
|
+
return false
|
142
|
+
end
|
143
|
+
|
144
|
+
#
|
145
|
+
# Return the path of a previously added file by its ID.
|
146
|
+
#
|
147
|
+
# === Parameters:
|
148
|
+
#
|
149
|
+
# fid (String)::
|
150
|
+
# File ID which was generated by a previous #add operation.
|
151
|
+
#
|
152
|
+
# === Return Value:
|
153
|
+
#
|
154
|
+
# :: *String*, absolute path of the file in the pool.
|
155
|
+
def self.path fid
|
156
|
+
raise InvalidFileId unless valid?(fid)
|
157
|
+
id2dir(fid) + "/#{fid}"
|
158
|
+
end
|
159
|
+
|
160
|
+
#
|
161
|
+
# Remove a previously added file by its ID. Same as FilePool.remove,
|
162
|
+
# but throws exceptions on failure.
|
163
|
+
#
|
164
|
+
# === Parameters:
|
165
|
+
#
|
166
|
+
# fid (String)::
|
167
|
+
# File ID which was generated by a previous #add operation.
|
168
|
+
def self.remove! fid
|
169
|
+
FileUtils.rm path(fid)
|
170
|
+
end
|
171
|
+
|
172
|
+
#
|
173
|
+
# Remove a previously added file by its ID. Same as FilePool.remove!, but
|
174
|
+
# doesn't throw exceptions.
|
175
|
+
#
|
176
|
+
# === Parameters:
|
177
|
+
#
|
178
|
+
# fid (String)::
|
179
|
+
# File ID which was generated by a previous #add operation.
|
180
|
+
#
|
181
|
+
# === Return Value:
|
182
|
+
#
|
183
|
+
# :: *Boolean*, +true+ if file was removed successfully, +false+ else
|
184
|
+
def self.remove fid
|
185
|
+
self.remove! fid
|
186
|
+
rescue Exception => ex
|
187
|
+
return false
|
188
|
+
end
|
189
|
+
|
190
|
+
#
|
191
|
+
# Returns some statistics about the current pool. (It may be slow if
|
192
|
+
# the pool contains very many files as it computes them from scratch.)
|
193
|
+
#
|
194
|
+
# === Return Value
|
195
|
+
#
|
196
|
+
# :: *Hash* with keys
|
197
|
+
# :total_number (Integer)::
|
198
|
+
# Number of files in pool
|
199
|
+
# :total_size (Integer)::
|
200
|
+
# Total number of bytes of all files
|
201
|
+
# :median_size (Float)::
|
202
|
+
# Median of file sizes (most frequent size)
|
203
|
+
# :last_add (Time)::
|
204
|
+
# Time and Date of last add operation
|
205
|
+
|
206
|
+
def self.stat
|
207
|
+
all_files = Dir.glob("#{root}/*/*/*/*")
|
208
|
+
all_stats = all_files.map{|f| File.stat(f) }
|
209
|
+
|
210
|
+
{
|
211
|
+
:total_size => all_stats.inject(0){|sum,stat| sum+=stat.size},
|
212
|
+
:median_size => median(all_stats.map{|stat| stat.size}),
|
213
|
+
:file_number => all_files.length,
|
214
|
+
:last_add => all_stats.map{|stat| stat.ctime}.max
|
215
|
+
}
|
216
|
+
end
|
217
|
+
|
218
|
+
|
219
|
+
private
|
220
|
+
|
221
|
+
def self.root
|
222
|
+
@@root rescue raise("FilePool: no root directory defined. Use FilePool#setup.")
|
223
|
+
end
|
224
|
+
|
225
|
+
# path from fid without file name
|
226
|
+
def self.id2dir fid
|
227
|
+
"#{root}/#{fid[0,1]}/#{fid[1,1]}/#{fid[2,1]}"
|
228
|
+
end
|
229
|
+
|
230
|
+
# return a new UUID type 4 (random) as String
|
231
|
+
def self.uuid
|
232
|
+
UUIDTools::UUID.random_create.to_s
|
233
|
+
end
|
234
|
+
|
235
|
+
# return +true+ if _uuid_ is a valid UUID type 4
|
236
|
+
def self.valid? uuid
|
237
|
+
begin
|
238
|
+
UUIDTools::UUID.parse(uuid).valid?
|
239
|
+
rescue TypeError, ArgumentError
|
240
|
+
return false
|
241
|
+
end
|
242
|
+
end
|
243
|
+
|
244
|
+
# median file size
|
245
|
+
def self.median(sizes)
|
246
|
+
arr = sizes
|
247
|
+
sortedarr = arr.sort
|
248
|
+
medpt1 = arr.length / 2
|
249
|
+
medpt2 = (arr.length+1)/2
|
250
|
+
(sortedarr[medpt1] + sortedarr[medpt2]).to_f / 2
|
251
|
+
end
|
252
|
+
|
253
|
+
end
|
@@ -0,0 +1,77 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'shoulda'
|
3
|
+
require 'file_pool'
|
4
|
+
|
5
|
+
class FilePoolTest < Test::Unit::TestCase
|
6
|
+
|
7
|
+
def setup
|
8
|
+
@test_dir = "#{File.dirname(__FILE__)}/files"
|
9
|
+
@pool_root = "#{File.dirname(__FILE__)}/fp_root"
|
10
|
+
FilePool.setup @pool_root
|
11
|
+
end
|
12
|
+
|
13
|
+
def teardown
|
14
|
+
FileUtils.rm_r(Dir.glob @pool_root+"/*")
|
15
|
+
end
|
16
|
+
|
17
|
+
context "File Pool" do
|
18
|
+
should "store files" do
|
19
|
+
fid = FilePool.add(@test_dir+"/a")
|
20
|
+
|
21
|
+
assert UUIDTools::UUID.parse(fid).valid?
|
22
|
+
|
23
|
+
md5_orig = Digest::MD5.hexdigest(File.open(@test_dir+"/a").read)
|
24
|
+
md5_pooled = Digest::MD5.hexdigest(File.open(@pool_root + "/#{fid[0,1]}/#{fid[1,1]}/#{fid[2,1]}/#{fid}").read)
|
25
|
+
|
26
|
+
assert_equal md5_orig, md5_pooled
|
27
|
+
end
|
28
|
+
|
29
|
+
should "return path from stored files" do
|
30
|
+
|
31
|
+
fidb = FilePool.add(@test_dir+"/b")
|
32
|
+
fidc = FilePool.add(@test_dir+"/c")
|
33
|
+
fidd = FilePool.add!(@test_dir+"/d")
|
34
|
+
|
35
|
+
assert_equal "#{@pool_root}/#{fidb[0,1]}/#{fidb[1,1]}/#{fidb[2,1]}/#{fidb}", FilePool.path(fidb)
|
36
|
+
assert_equal "#{@pool_root}/#{fidc[0,1]}/#{fidc[1,1]}/#{fidc[2,1]}/#{fidc}", FilePool.path(fidc)
|
37
|
+
assert_equal "#{@pool_root}/#{fidd[0,1]}/#{fidd[1,1]}/#{fidd[2,1]}/#{fidd}", FilePool.path(fidd)
|
38
|
+
|
39
|
+
end
|
40
|
+
|
41
|
+
should "remove files from pool" do
|
42
|
+
|
43
|
+
fidb = FilePool.add(@test_dir+"/b")
|
44
|
+
fidc = FilePool.add!(@test_dir+"/c")
|
45
|
+
fidd = FilePool.add!(@test_dir+"/d")
|
46
|
+
|
47
|
+
path_c = FilePool.path(fidc)
|
48
|
+
FilePool.remove(fidc)
|
49
|
+
|
50
|
+
assert !File.exist?(path_c)
|
51
|
+
assert File.exist?(FilePool.path(fidb))
|
52
|
+
assert File.exist?(FilePool.path(fidd))
|
53
|
+
|
54
|
+
end
|
55
|
+
|
56
|
+
should "throw excceptions when using add! and remove! on failure" do
|
57
|
+
assert_raises(FilePool::InvalidFileId) do
|
58
|
+
FilePool.remove!("invalid-id")
|
59
|
+
end
|
60
|
+
|
61
|
+
assert_raises(Errno::ENOENT) do
|
62
|
+
FilePool.remove!("61e9b2d1-1738-440d-9b3d-e3c64876f2b0")
|
63
|
+
end
|
64
|
+
|
65
|
+
assert_raises(Errno::ENOENT) do
|
66
|
+
FilePool.add!("/not/here/foo.png")
|
67
|
+
end
|
68
|
+
|
69
|
+
end
|
70
|
+
|
71
|
+
should "not throw exceptions when using add and remove on failure" do
|
72
|
+
assert !FilePool.remove("invalid-id")
|
73
|
+
assert !FilePool.remove("61e9b2d1-1738-440d-9b3d-e3c64876f2b0")
|
74
|
+
assert !FilePool.add("/not/here/foo.png")
|
75
|
+
end
|
76
|
+
end
|
77
|
+
end
|
metadata
ADDED
@@ -0,0 +1,103 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: file_pool
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: 23
|
5
|
+
prerelease:
|
6
|
+
segments:
|
7
|
+
- 0
|
8
|
+
- 2
|
9
|
+
- 0
|
10
|
+
version: 0.2.0
|
11
|
+
platform: ruby
|
12
|
+
authors:
|
13
|
+
- "robokopp (Robert Anni\xC3\xA9s)"
|
14
|
+
autorequire:
|
15
|
+
bindir: bin
|
16
|
+
cert_chain: []
|
17
|
+
|
18
|
+
date: 2012-09-17 00:00:00 Z
|
19
|
+
dependencies:
|
20
|
+
- !ruby/object:Gem::Dependency
|
21
|
+
name: shoulda
|
22
|
+
prerelease: false
|
23
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
24
|
+
none: false
|
25
|
+
requirements:
|
26
|
+
- - ">="
|
27
|
+
- !ruby/object:Gem::Version
|
28
|
+
hash: 3
|
29
|
+
segments:
|
30
|
+
- 0
|
31
|
+
version: "0"
|
32
|
+
type: :development
|
33
|
+
version_requirements: *id001
|
34
|
+
- !ruby/object:Gem::Dependency
|
35
|
+
name: uuidtools
|
36
|
+
prerelease: false
|
37
|
+
requirement: &id002 !ruby/object:Gem::Requirement
|
38
|
+
none: false
|
39
|
+
requirements:
|
40
|
+
- - ~>
|
41
|
+
- !ruby/object:Gem::Version
|
42
|
+
hash: 15
|
43
|
+
segments:
|
44
|
+
- 2
|
45
|
+
- 1
|
46
|
+
- 2
|
47
|
+
version: 2.1.2
|
48
|
+
type: :runtime
|
49
|
+
version_requirements: *id002
|
50
|
+
description: |
|
51
|
+
FilePool helps to manage a large number of files in a Ruby
|
52
|
+
project. It takes care of the storage of files in a balanced directory
|
53
|
+
tree and generates unique identifiers for all files.
|
54
|
+
|
55
|
+
email:
|
56
|
+
- robokopp@fernwerk.net
|
57
|
+
executables: []
|
58
|
+
|
59
|
+
extensions: []
|
60
|
+
|
61
|
+
extra_rdoc_files:
|
62
|
+
- README.md
|
63
|
+
files:
|
64
|
+
- lib/file_pool.rb
|
65
|
+
- lib/file_pool/version.rb
|
66
|
+
- README.md
|
67
|
+
- test/test_file_pool.rb
|
68
|
+
homepage: https://github.com/robokopp/file_pool
|
69
|
+
licenses: []
|
70
|
+
|
71
|
+
post_install_message:
|
72
|
+
rdoc_options: []
|
73
|
+
|
74
|
+
require_paths:
|
75
|
+
- lib
|
76
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
77
|
+
none: false
|
78
|
+
requirements:
|
79
|
+
- - ">="
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
hash: 3
|
82
|
+
segments:
|
83
|
+
- 0
|
84
|
+
version: "0"
|
85
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
86
|
+
none: false
|
87
|
+
requirements:
|
88
|
+
- - ">="
|
89
|
+
- !ruby/object:Gem::Version
|
90
|
+
hash: 3
|
91
|
+
segments:
|
92
|
+
- 0
|
93
|
+
version: "0"
|
94
|
+
requirements: []
|
95
|
+
|
96
|
+
rubyforge_project:
|
97
|
+
rubygems_version: 1.8.24
|
98
|
+
signing_key:
|
99
|
+
specification_version: 3
|
100
|
+
summary: Manage a large number files in a pool
|
101
|
+
test_files:
|
102
|
+
- test/test_file_pool.rb
|
103
|
+
has_rdoc:
|