fast_find 0.1.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +9 -0
- data/.travis.yml +3 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +21 -0
- data/README.md +87 -0
- data/Rakefile +9 -0
- data/bin/benchmark +28 -0
- data/bin/console +14 -0
- data/bin/setup +7 -0
- data/fast_find.gemspec +37 -0
- data/lib/fast_find.rb +158 -0
- data/lib/fast_find/version.rb +3 -0
- metadata +102 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 9c5cd7c541e439cb4bd46acd6c28838e7867a369
|
4
|
+
data.tar.gz: 827db02601353683fd17828a779f71d5a282f80c
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: dd2cde72e12d19e762e9a6830f73c12024bab29cecdb7f439501627f0a4c306aafc7130bd2e3ac6344ea7e0b20b4c57c22510f2859cdfd5a2c37dc7935d93913
|
7
|
+
data.tar.gz: af278abe2c42f33cb905e227c389a4a8c0d73aa5c4b30eb7f08b479159a21049e59ba1353776888c72ac1a7dbef1135931b7439060adc07b804f4fd5c8fbadec
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2015 Thomas Hurst
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
# FastFind
|
2
|
+
|
3
|
+
FastFind is a performance-oriented multi-threaded alternative to the standard
|
4
|
+
`Find` module that ships with Ruby. It should generally be a drop-in
|
5
|
+
replacement.
|
6
|
+
|
7
|
+
FastFind is expected to be marginally slower on MRI/YARV, since multithreaded
|
8
|
+
`File#lstat` calls there appear to serialize. However, using the FastFind-
|
9
|
+
specific second argument to pass in the `File::Stat` object for each file may
|
10
|
+
still prove a win.
|
11
|
+
|
12
|
+
This code is considered experimental. Beware of dog.
|
13
|
+
|
14
|
+
## Installation
|
15
|
+
|
16
|
+
Add this line to your application's Gemfile:
|
17
|
+
|
18
|
+
gem 'fast_find', git: 'https://github.com/Freaky/fast_find.git'
|
19
|
+
|
20
|
+
And then execute:
|
21
|
+
|
22
|
+
$ bundle
|
23
|
+
|
24
|
+
## Usage
|
25
|
+
|
26
|
+
Traditional Find-style:
|
27
|
+
|
28
|
+
FastFind.find(dir) {|entry| frob(entry) }
|
29
|
+
FastFind.find(dir, ignore_errors: false) { .. } # => explodes in your face
|
30
|
+
FastFind.find(dir) # => Enumerator
|
31
|
+
|
32
|
+
Extended style using the second argument to get a `File::Stat`, or `Exception`
|
33
|
+
object (if `ignore_errors` is false, this will be raised instead).
|
34
|
+
|
35
|
+
FastFind.find(dir) {|entry, stat| frob(entry, stat) }
|
36
|
+
|
37
|
+
For increased performance and better scaling behaviour, it is recommended to use
|
38
|
+
a single shared FastFind object. Multiple concurrent calls to
|
39
|
+
`FastFind::Finder#find` are safe and will share a persistant work pool.
|
40
|
+
|
41
|
+
Finder = FastFind::Finder.new
|
42
|
+
Finder.find(dir) { .. }
|
43
|
+
|
44
|
+
You can call `Finder#shutdown` to close the work pool if you're done with the
|
45
|
+
instance for the time being. Ensure no other calls to its `#find` are in flight
|
46
|
+
beforehand. The pool is restarted the next time `#find` is called.
|
47
|
+
|
48
|
+
Use the `concurrency` named argument to change the number of worker threads:
|
49
|
+
|
50
|
+
FastFind.find(dir, concurrency: 4)
|
51
|
+
FastFind::Finder.new(concurrency: 4)
|
52
|
+
|
53
|
+
Defaults are `8` for Rubinius and JRuby, `1` for anything else.
|
54
|
+
|
55
|
+
Note the yielded blocks are all executed in the parent thread, *not* in workers.
|
56
|
+
|
57
|
+
`FastFind#prune` works. So does `Find#prune`.
|
58
|
+
|
59
|
+
## Performance
|
60
|
+
|
61
|
+
Scanning a cached copy of the NetBSD CVS repository:
|
62
|
+
|
63
|
+
jruby 9.0.1.0-SNAPSHOT (2.2.2) 2015-07-23 e88911e OpenJDK 64-Bit Server VM
|
64
|
+
25.51-b03 on 1.8.0_51-b16 +jit [FreeBSD-amd64]:
|
65
|
+
|
66
|
+
user system total real
|
67
|
+
Find 32.890625 27.742188 60.632813 ( 47.518944)
|
68
|
+
FastFind 35.273438 41.742188 77.015625 ( 8.140893)
|
69
|
+
|
70
|
+
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-freebsd10.1]:
|
71
|
+
|
72
|
+
user system total real
|
73
|
+
Find 10.187500 22.351562 32.539062 ( 32.545201)
|
74
|
+
FastFind 9.039062 14.226562 23.265625 ( 23.277589)
|
75
|
+
|
76
|
+
On MRI `Find` here is penalised because both `Find` and the benchmark code is
|
77
|
+
performing a `File#lstat`.
|
78
|
+
|
79
|
+
## Development
|
80
|
+
|
81
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run
|
82
|
+
`bin/console` for an interactive prompt that will allow you to experiment.
|
83
|
+
|
84
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To
|
85
|
+
release a new version, update the version number in `version.rb`, and then run
|
86
|
+
`bundle exec rake release` to create a git tag for the version, push git commits
|
87
|
+
and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
data/Rakefile
ADDED
data/bin/benchmark
ADDED
@@ -0,0 +1,28 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'find'
|
4
|
+
require 'benchmark'
|
5
|
+
|
6
|
+
require "bundler/setup"
|
7
|
+
require 'fast_find'
|
8
|
+
|
9
|
+
FastFinder = FastFind::Finder.new
|
10
|
+
|
11
|
+
test_dirs = ARGV
|
12
|
+
abort("Usage: #{$0} [dir1 [dir2[ ..]]]") if test_dirs.empty?
|
13
|
+
|
14
|
+
Benchmark.bmbm do |b|
|
15
|
+
b.report("Find") do
|
16
|
+
files = Set.new
|
17
|
+
Find.find(*test_dirs) do |f|
|
18
|
+
files << [f, File.lstat(f)]
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
b.report("FastFind") do
|
23
|
+
files = Set.new
|
24
|
+
FastFinder.find(*test_dirs) do |f, stat|
|
25
|
+
files << [f, stat]
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "fast_find"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start
|
data/bin/setup
ADDED
data/fast_find.gemspec
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'fast_find/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "fast_find"
|
8
|
+
spec.version = FastFind::VERSION
|
9
|
+
spec.authors = ["Thomas Hurst"]
|
10
|
+
spec.email = ["tom@hur.st"]
|
11
|
+
|
12
|
+
spec.summary = %q{High performance 'find' alternative.}
|
13
|
+
spec.description = %q{FastFind is a find workalike which optionally passes
|
14
|
+
in the File::Stat to the block, and can use multile
|
15
|
+
threads to walk directories concurrently.}
|
16
|
+
spec.homepage = "https://github.com/Freaky/fast_find"
|
17
|
+
spec.license = "MIT"
|
18
|
+
|
19
|
+
# Prevent pushing this gem to RubyGems.org by setting 'allowed_push_host', or
|
20
|
+
# delete this section to allow pushing this gem to any host.
|
21
|
+
if spec.respond_to?(:metadata)
|
22
|
+
spec.metadata['allowed_push_host'] = "https://rubygems.org"
|
23
|
+
else
|
24
|
+
raise "RubyGems 2.0 or newer is required to protect against public gem pushes."
|
25
|
+
end
|
26
|
+
|
27
|
+
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
28
|
+
spec.bindir = "exe"
|
29
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
30
|
+
spec.require_paths = ["lib"]
|
31
|
+
|
32
|
+
spec.required_ruby_version = ">= 2.0"
|
33
|
+
|
34
|
+
spec.add_development_dependency "bundler", "~> 1.9"
|
35
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
36
|
+
spec.add_development_dependency "minitest"
|
37
|
+
end
|
data/lib/fast_find.rb
ADDED
@@ -0,0 +1,158 @@
|
|
1
|
+
#
|
2
|
+
# fast_find.rb: A Find workalike optimized for performance.
|
3
|
+
#
|
4
|
+
|
5
|
+
require 'set'
|
6
|
+
require 'thread'
|
7
|
+
require 'fast_find/version'
|
8
|
+
|
9
|
+
module FastFind
|
10
|
+
DEFAULT_CONCURRENCY = %w(jruby rbx).include?(RUBY_ENGINE) ? 8 : 1
|
11
|
+
|
12
|
+
def self.find(*paths, concurrency: DEFAULT_CONCURRENCY, ignore_error: true,
|
13
|
+
&block)
|
14
|
+
Finder.new(concurrency: concurrency, one_shot: true)
|
15
|
+
.find(*paths, ignore_error: ignore_error, &block)
|
16
|
+
end
|
17
|
+
|
18
|
+
def self.prune
|
19
|
+
throw :prune
|
20
|
+
end
|
21
|
+
|
22
|
+
class Finder
|
23
|
+
def initialize(concurrency: DEFAULT_CONCURRENCY, one_shot: false)
|
24
|
+
@mutex = Mutex.new
|
25
|
+
@queue = Queue.new
|
26
|
+
@one_shot = one_shot
|
27
|
+
@concurrency = concurrency
|
28
|
+
@walkers = nil
|
29
|
+
end
|
30
|
+
|
31
|
+
def startup
|
32
|
+
@mutex.synchronize do
|
33
|
+
return if @walkers
|
34
|
+
|
35
|
+
@walkers = @concurrency.times.map { Walker.new.spawn(@queue) }
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
def shutdown
|
40
|
+
@mutex.synchronize do
|
41
|
+
return unless @walkers
|
42
|
+
|
43
|
+
@queue.clear
|
44
|
+
@walkers.each { @queue << nil }
|
45
|
+
@walkers.each(&:join)
|
46
|
+
|
47
|
+
@walkers = nil
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
def find(*paths, ignore_error: true, &block)
|
52
|
+
block or return enum_for(__method__, *paths, ignore_error: ignore_error)
|
53
|
+
|
54
|
+
results = Queue.new
|
55
|
+
pending = Set.new
|
56
|
+
|
57
|
+
startup
|
58
|
+
|
59
|
+
paths.map!(&:dup).each do |path|
|
60
|
+
path = path.to_path if path.respond_to? :to_path
|
61
|
+
results << [path, Util.safe_stat(path)]
|
62
|
+
end
|
63
|
+
results << [:initial, :finished]
|
64
|
+
pending << [:initial, :initial.encoding]
|
65
|
+
|
66
|
+
while result = results.deq
|
67
|
+
path, stat = result
|
68
|
+
|
69
|
+
if stat == :finished
|
70
|
+
pending.delete([path, path.encoding])
|
71
|
+
|
72
|
+
if pending.empty?
|
73
|
+
break
|
74
|
+
else
|
75
|
+
next
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
catch(:prune) do
|
80
|
+
yield_entry(result, block) if path.is_a? String
|
81
|
+
|
82
|
+
case stat
|
83
|
+
when Exception then raise stat unless ignore_error
|
84
|
+
when File::Stat
|
85
|
+
if stat.directory? and !pending.include?(pe = [path, path.encoding])
|
86
|
+
pending << pe
|
87
|
+
@queue << [path, results]
|
88
|
+
end
|
89
|
+
end
|
90
|
+
end
|
91
|
+
end
|
92
|
+
ensure
|
93
|
+
if one_shot?
|
94
|
+
@queue.clear
|
95
|
+
shutdown
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
private
|
100
|
+
|
101
|
+
def one_shot?() !!@one_shot end
|
102
|
+
|
103
|
+
def yield_entry(entry, block)
|
104
|
+
if block.arity == 2
|
105
|
+
block.call(entry[0].dup.taint, entry[1])
|
106
|
+
else
|
107
|
+
block.call entry[0].dup.taint
|
108
|
+
end
|
109
|
+
end
|
110
|
+
end
|
111
|
+
|
112
|
+
class Walker
|
113
|
+
def spawn(queue)
|
114
|
+
Thread.new do
|
115
|
+
@encoding = Encoding.find("filesystem")
|
116
|
+
while job = queue.deq
|
117
|
+
walk(job[0], job[1])
|
118
|
+
end
|
119
|
+
end
|
120
|
+
end
|
121
|
+
|
122
|
+
def walk(path, results)
|
123
|
+
enc = path.encoding == Encoding::US_ASCII ? @encoding : path.encoding
|
124
|
+
|
125
|
+
Dir.entries(path, encoding: enc).each do |entry|
|
126
|
+
next if entry == '.' or entry == '..'
|
127
|
+
|
128
|
+
stat(File.join(path, entry), results)
|
129
|
+
end
|
130
|
+
rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
|
131
|
+
Errno::ENAMETOOLONG => e
|
132
|
+
error(e, results)
|
133
|
+
ensure
|
134
|
+
finish(path, results)
|
135
|
+
end
|
136
|
+
|
137
|
+
def stat(entry, results)
|
138
|
+
results << [entry, Util.safe_stat(entry)]
|
139
|
+
end
|
140
|
+
|
141
|
+
def finish(path, results)
|
142
|
+
results << [path, :finished]
|
143
|
+
end
|
144
|
+
|
145
|
+
def error(e, results)
|
146
|
+
results << [:exception, e]
|
147
|
+
end
|
148
|
+
end
|
149
|
+
|
150
|
+
module Util
|
151
|
+
def self.safe_stat(path)
|
152
|
+
File.lstat(path)
|
153
|
+
rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
|
154
|
+
Errno::ENAMETOOLONG => e
|
155
|
+
e
|
156
|
+
end
|
157
|
+
end
|
158
|
+
end
|
metadata
ADDED
@@ -0,0 +1,102 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: fast_find
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Thomas Hurst
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2015-08-01 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
15
|
+
requirements:
|
16
|
+
- - "~>"
|
17
|
+
- !ruby/object:Gem::Version
|
18
|
+
version: '1.9'
|
19
|
+
name: bundler
|
20
|
+
prerelease: false
|
21
|
+
type: :development
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.9'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
29
|
+
requirements:
|
30
|
+
- - "~>"
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: '10.0'
|
33
|
+
name: rake
|
34
|
+
prerelease: false
|
35
|
+
type: :development
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '10.0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
requirement: !ruby/object:Gem::Requirement
|
43
|
+
requirements:
|
44
|
+
- - ">="
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
version: '0'
|
47
|
+
name: minitest
|
48
|
+
prerelease: false
|
49
|
+
type: :development
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
description: |-
|
56
|
+
FastFind is a find workalike which optionally passes
|
57
|
+
in the File::Stat to the block, and can use multile
|
58
|
+
threads to walk directories concurrently.
|
59
|
+
email:
|
60
|
+
- tom@hur.st
|
61
|
+
executables: []
|
62
|
+
extensions: []
|
63
|
+
extra_rdoc_files: []
|
64
|
+
files:
|
65
|
+
- ".gitignore"
|
66
|
+
- ".travis.yml"
|
67
|
+
- Gemfile
|
68
|
+
- LICENSE.txt
|
69
|
+
- README.md
|
70
|
+
- Rakefile
|
71
|
+
- bin/benchmark
|
72
|
+
- bin/console
|
73
|
+
- bin/setup
|
74
|
+
- fast_find.gemspec
|
75
|
+
- lib/fast_find.rb
|
76
|
+
- lib/fast_find/version.rb
|
77
|
+
homepage: https://github.com/Freaky/fast_find
|
78
|
+
licenses:
|
79
|
+
- MIT
|
80
|
+
metadata:
|
81
|
+
allowed_push_host: https://rubygems.org
|
82
|
+
post_install_message:
|
83
|
+
rdoc_options: []
|
84
|
+
require_paths:
|
85
|
+
- lib
|
86
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
87
|
+
requirements:
|
88
|
+
- - ">="
|
89
|
+
- !ruby/object:Gem::Version
|
90
|
+
version: '2.0'
|
91
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
92
|
+
requirements:
|
93
|
+
- - ">="
|
94
|
+
- !ruby/object:Gem::Version
|
95
|
+
version: '0'
|
96
|
+
requirements: []
|
97
|
+
rubyforge_project:
|
98
|
+
rubygems_version: 2.4.8
|
99
|
+
signing_key:
|
100
|
+
specification_version: 4
|
101
|
+
summary: High performance 'find' alternative.
|
102
|
+
test_files: []
|