fast_find 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +9 -0
- data/.travis.yml +3 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +21 -0
- data/README.md +87 -0
- data/Rakefile +9 -0
- data/bin/benchmark +28 -0
- data/bin/console +14 -0
- data/bin/setup +7 -0
- data/fast_find.gemspec +37 -0
- data/lib/fast_find.rb +158 -0
- data/lib/fast_find/version.rb +3 -0
- metadata +102 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 9c5cd7c541e439cb4bd46acd6c28838e7867a369
|
4
|
+
data.tar.gz: 827db02601353683fd17828a779f71d5a282f80c
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: dd2cde72e12d19e762e9a6830f73c12024bab29cecdb7f439501627f0a4c306aafc7130bd2e3ac6344ea7e0b20b4c57c22510f2859cdfd5a2c37dc7935d93913
|
7
|
+
data.tar.gz: af278abe2c42f33cb905e227c389a4a8c0d73aa5c4b30eb7f08b479159a21049e59ba1353776888c72ac1a7dbef1135931b7439060adc07b804f4fd5c8fbadec
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2015 Thomas Hurst
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
# FastFind
|
2
|
+
|
3
|
+
FastFind is a performance-oriented multi-threaded alternative to the standard
|
4
|
+
`Find` module that ships with Ruby. It should generally be a drop-in
|
5
|
+
replacement.
|
6
|
+
|
7
|
+
FastFind is expected to be marginally slower on MRI/YARV, since multithreaded
|
8
|
+
`File#lstat` calls there appear to serialize. However, using the FastFind-
|
9
|
+
specific second argument to pass in the `File::Stat` object for each file may
|
10
|
+
still prove a win.
|
11
|
+
|
12
|
+
This code is considered experimental. Beware of dog.
|
13
|
+
|
14
|
+
## Installation
|
15
|
+
|
16
|
+
Add this line to your application's Gemfile:
|
17
|
+
|
18
|
+
gem 'fast_find', git: 'https://github.com/Freaky/fast_find.git'
|
19
|
+
|
20
|
+
And then execute:
|
21
|
+
|
22
|
+
$ bundle
|
23
|
+
|
24
|
+
## Usage
|
25
|
+
|
26
|
+
Traditional Find-style:
|
27
|
+
|
28
|
+
FastFind.find(dir) {|entry| frob(entry) }
|
29
|
+
FastFind.find(dir, ignore_errors: false) { .. } # => explodes in your face
|
30
|
+
FastFind.find(dir) # => Enumerator
|
31
|
+
|
32
|
+
Extended style using the second argument to get a `File::Stat`, or `Exception`
|
33
|
+
object (if `ignore_errors` is false, this will be raised instead).
|
34
|
+
|
35
|
+
FastFind.find(dir) {|entry, stat| frob(entry, stat) }
|
36
|
+
|
37
|
+
For increased performance and better scaling behaviour, it is recommended to use
|
38
|
+
a single shared FastFind object. Multiple concurrent calls to
|
39
|
+
`FastFind::Finder#find` are safe and will share a persistant work pool.
|
40
|
+
|
41
|
+
Finder = FastFind::Finder.new
|
42
|
+
Finder.find(dir) { .. }
|
43
|
+
|
44
|
+
You can call `Finder#shutdown` to close the work pool if you're done with the
|
45
|
+
instance for the time being. Ensure no other calls to its `#find` are in flight
|
46
|
+
beforehand. The pool is restarted the next time `#find` is called.
|
47
|
+
|
48
|
+
Use the `concurrency` named argument to change the number of worker threads:
|
49
|
+
|
50
|
+
FastFind.find(dir, concurrency: 4)
|
51
|
+
FastFind::Finder.new(concurrency: 4)
|
52
|
+
|
53
|
+
Defaults are `8` for Rubinius and JRuby, `1` for anything else.
|
54
|
+
|
55
|
+
Note the yielded blocks are all executed in the parent thread, *not* in workers.
|
56
|
+
|
57
|
+
`FastFind#prune` works. So does `Find#prune`.
|
58
|
+
|
59
|
+
## Performance
|
60
|
+
|
61
|
+
Scanning a cached copy of the NetBSD CVS repository:
|
62
|
+
|
63
|
+
jruby 9.0.1.0-SNAPSHOT (2.2.2) 2015-07-23 e88911e OpenJDK 64-Bit Server VM
|
64
|
+
25.51-b03 on 1.8.0_51-b16 +jit [FreeBSD-amd64]:
|
65
|
+
|
66
|
+
user system total real
|
67
|
+
Find 32.890625 27.742188 60.632813 ( 47.518944)
|
68
|
+
FastFind 35.273438 41.742188 77.015625 ( 8.140893)
|
69
|
+
|
70
|
+
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-freebsd10.1]:
|
71
|
+
|
72
|
+
user system total real
|
73
|
+
Find 10.187500 22.351562 32.539062 ( 32.545201)
|
74
|
+
FastFind 9.039062 14.226562 23.265625 ( 23.277589)
|
75
|
+
|
76
|
+
On MRI `Find` here is penalised because both `Find` and the benchmark code is
|
77
|
+
performing a `File#lstat`.
|
78
|
+
|
79
|
+
## Development
|
80
|
+
|
81
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run
|
82
|
+
`bin/console` for an interactive prompt that will allow you to experiment.
|
83
|
+
|
84
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To
|
85
|
+
release a new version, update the version number in `version.rb`, and then run
|
86
|
+
`bundle exec rake release` to create a git tag for the version, push git commits
|
87
|
+
and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
data/Rakefile
ADDED
data/bin/benchmark
ADDED
@@ -0,0 +1,28 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'find'
|
4
|
+
require 'benchmark'
|
5
|
+
|
6
|
+
require "bundler/setup"
|
7
|
+
require 'fast_find'
|
8
|
+
|
9
|
+
FastFinder = FastFind::Finder.new
|
10
|
+
|
11
|
+
test_dirs = ARGV
|
12
|
+
abort("Usage: #{$0} [dir1 [dir2[ ..]]]") if test_dirs.empty?
|
13
|
+
|
14
|
+
Benchmark.bmbm do |b|
|
15
|
+
b.report("Find") do
|
16
|
+
files = Set.new
|
17
|
+
Find.find(*test_dirs) do |f|
|
18
|
+
files << [f, File.lstat(f)]
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
b.report("FastFind") do
|
23
|
+
files = Set.new
|
24
|
+
FastFinder.find(*test_dirs) do |f, stat|
|
25
|
+
files << [f, stat]
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "fast_find"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start
|
data/bin/setup
ADDED
data/fast_find.gemspec
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'fast_find/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "fast_find"
|
8
|
+
spec.version = FastFind::VERSION
|
9
|
+
spec.authors = ["Thomas Hurst"]
|
10
|
+
spec.email = ["tom@hur.st"]
|
11
|
+
|
12
|
+
spec.summary = %q{High performance 'find' alternative.}
|
13
|
+
spec.description = %q{FastFind is a find workalike which optionally passes
|
14
|
+
in the File::Stat to the block, and can use multile
|
15
|
+
threads to walk directories concurrently.}
|
16
|
+
spec.homepage = "https://github.com/Freaky/fast_find"
|
17
|
+
spec.license = "MIT"
|
18
|
+
|
19
|
+
# Prevent pushing this gem to RubyGems.org by setting 'allowed_push_host', or
|
20
|
+
# delete this section to allow pushing this gem to any host.
|
21
|
+
if spec.respond_to?(:metadata)
|
22
|
+
spec.metadata['allowed_push_host'] = "https://rubygems.org"
|
23
|
+
else
|
24
|
+
raise "RubyGems 2.0 or newer is required to protect against public gem pushes."
|
25
|
+
end
|
26
|
+
|
27
|
+
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
28
|
+
spec.bindir = "exe"
|
29
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
30
|
+
spec.require_paths = ["lib"]
|
31
|
+
|
32
|
+
spec.required_ruby_version = ">= 2.0"
|
33
|
+
|
34
|
+
spec.add_development_dependency "bundler", "~> 1.9"
|
35
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
36
|
+
spec.add_development_dependency "minitest"
|
37
|
+
end
|
data/lib/fast_find.rb
ADDED
@@ -0,0 +1,158 @@
|
|
1
|
+
#
|
2
|
+
# fast_find.rb: A Find workalike optimized for performance.
|
3
|
+
#
|
4
|
+
|
5
|
+
require 'set'
|
6
|
+
require 'thread'
|
7
|
+
require 'fast_find/version'
|
8
|
+
|
9
|
+
module FastFind
|
10
|
+
DEFAULT_CONCURRENCY = %w(jruby rbx).include?(RUBY_ENGINE) ? 8 : 1
|
11
|
+
|
12
|
+
def self.find(*paths, concurrency: DEFAULT_CONCURRENCY, ignore_error: true,
|
13
|
+
&block)
|
14
|
+
Finder.new(concurrency: concurrency, one_shot: true)
|
15
|
+
.find(*paths, ignore_error: ignore_error, &block)
|
16
|
+
end
|
17
|
+
|
18
|
+
def self.prune
|
19
|
+
throw :prune
|
20
|
+
end
|
21
|
+
|
22
|
+
class Finder
|
23
|
+
def initialize(concurrency: DEFAULT_CONCURRENCY, one_shot: false)
|
24
|
+
@mutex = Mutex.new
|
25
|
+
@queue = Queue.new
|
26
|
+
@one_shot = one_shot
|
27
|
+
@concurrency = concurrency
|
28
|
+
@walkers = nil
|
29
|
+
end
|
30
|
+
|
31
|
+
def startup
|
32
|
+
@mutex.synchronize do
|
33
|
+
return if @walkers
|
34
|
+
|
35
|
+
@walkers = @concurrency.times.map { Walker.new.spawn(@queue) }
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
def shutdown
|
40
|
+
@mutex.synchronize do
|
41
|
+
return unless @walkers
|
42
|
+
|
43
|
+
@queue.clear
|
44
|
+
@walkers.each { @queue << nil }
|
45
|
+
@walkers.each(&:join)
|
46
|
+
|
47
|
+
@walkers = nil
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
def find(*paths, ignore_error: true, &block)
|
52
|
+
block or return enum_for(__method__, *paths, ignore_error: ignore_error)
|
53
|
+
|
54
|
+
results = Queue.new
|
55
|
+
pending = Set.new
|
56
|
+
|
57
|
+
startup
|
58
|
+
|
59
|
+
paths.map!(&:dup).each do |path|
|
60
|
+
path = path.to_path if path.respond_to? :to_path
|
61
|
+
results << [path, Util.safe_stat(path)]
|
62
|
+
end
|
63
|
+
results << [:initial, :finished]
|
64
|
+
pending << [:initial, :initial.encoding]
|
65
|
+
|
66
|
+
while result = results.deq
|
67
|
+
path, stat = result
|
68
|
+
|
69
|
+
if stat == :finished
|
70
|
+
pending.delete([path, path.encoding])
|
71
|
+
|
72
|
+
if pending.empty?
|
73
|
+
break
|
74
|
+
else
|
75
|
+
next
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
catch(:prune) do
|
80
|
+
yield_entry(result, block) if path.is_a? String
|
81
|
+
|
82
|
+
case stat
|
83
|
+
when Exception then raise stat unless ignore_error
|
84
|
+
when File::Stat
|
85
|
+
if stat.directory? and !pending.include?(pe = [path, path.encoding])
|
86
|
+
pending << pe
|
87
|
+
@queue << [path, results]
|
88
|
+
end
|
89
|
+
end
|
90
|
+
end
|
91
|
+
end
|
92
|
+
ensure
|
93
|
+
if one_shot?
|
94
|
+
@queue.clear
|
95
|
+
shutdown
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
private
|
100
|
+
|
101
|
+
def one_shot?() !!@one_shot end
|
102
|
+
|
103
|
+
def yield_entry(entry, block)
|
104
|
+
if block.arity == 2
|
105
|
+
block.call(entry[0].dup.taint, entry[1])
|
106
|
+
else
|
107
|
+
block.call entry[0].dup.taint
|
108
|
+
end
|
109
|
+
end
|
110
|
+
end
|
111
|
+
|
112
|
+
class Walker
|
113
|
+
def spawn(queue)
|
114
|
+
Thread.new do
|
115
|
+
@encoding = Encoding.find("filesystem")
|
116
|
+
while job = queue.deq
|
117
|
+
walk(job[0], job[1])
|
118
|
+
end
|
119
|
+
end
|
120
|
+
end
|
121
|
+
|
122
|
+
def walk(path, results)
|
123
|
+
enc = path.encoding == Encoding::US_ASCII ? @encoding : path.encoding
|
124
|
+
|
125
|
+
Dir.entries(path, encoding: enc).each do |entry|
|
126
|
+
next if entry == '.' or entry == '..'
|
127
|
+
|
128
|
+
stat(File.join(path, entry), results)
|
129
|
+
end
|
130
|
+
rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
|
131
|
+
Errno::ENAMETOOLONG => e
|
132
|
+
error(e, results)
|
133
|
+
ensure
|
134
|
+
finish(path, results)
|
135
|
+
end
|
136
|
+
|
137
|
+
def stat(entry, results)
|
138
|
+
results << [entry, Util.safe_stat(entry)]
|
139
|
+
end
|
140
|
+
|
141
|
+
def finish(path, results)
|
142
|
+
results << [path, :finished]
|
143
|
+
end
|
144
|
+
|
145
|
+
def error(e, results)
|
146
|
+
results << [:exception, e]
|
147
|
+
end
|
148
|
+
end
|
149
|
+
|
150
|
+
module Util
|
151
|
+
def self.safe_stat(path)
|
152
|
+
File.lstat(path)
|
153
|
+
rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
|
154
|
+
Errno::ENAMETOOLONG => e
|
155
|
+
e
|
156
|
+
end
|
157
|
+
end
|
158
|
+
end
|
metadata
ADDED
@@ -0,0 +1,102 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: fast_find
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Thomas Hurst
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2015-08-01 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
15
|
+
requirements:
|
16
|
+
- - "~>"
|
17
|
+
- !ruby/object:Gem::Version
|
18
|
+
version: '1.9'
|
19
|
+
name: bundler
|
20
|
+
prerelease: false
|
21
|
+
type: :development
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.9'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
29
|
+
requirements:
|
30
|
+
- - "~>"
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: '10.0'
|
33
|
+
name: rake
|
34
|
+
prerelease: false
|
35
|
+
type: :development
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '10.0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
requirement: !ruby/object:Gem::Requirement
|
43
|
+
requirements:
|
44
|
+
- - ">="
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
version: '0'
|
47
|
+
name: minitest
|
48
|
+
prerelease: false
|
49
|
+
type: :development
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
description: |-
|
56
|
+
FastFind is a find workalike which optionally passes
|
57
|
+
in the File::Stat to the block, and can use multile
|
58
|
+
threads to walk directories concurrently.
|
59
|
+
email:
|
60
|
+
- tom@hur.st
|
61
|
+
executables: []
|
62
|
+
extensions: []
|
63
|
+
extra_rdoc_files: []
|
64
|
+
files:
|
65
|
+
- ".gitignore"
|
66
|
+
- ".travis.yml"
|
67
|
+
- Gemfile
|
68
|
+
- LICENSE.txt
|
69
|
+
- README.md
|
70
|
+
- Rakefile
|
71
|
+
- bin/benchmark
|
72
|
+
- bin/console
|
73
|
+
- bin/setup
|
74
|
+
- fast_find.gemspec
|
75
|
+
- lib/fast_find.rb
|
76
|
+
- lib/fast_find/version.rb
|
77
|
+
homepage: https://github.com/Freaky/fast_find
|
78
|
+
licenses:
|
79
|
+
- MIT
|
80
|
+
metadata:
|
81
|
+
allowed_push_host: https://rubygems.org
|
82
|
+
post_install_message:
|
83
|
+
rdoc_options: []
|
84
|
+
require_paths:
|
85
|
+
- lib
|
86
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
87
|
+
requirements:
|
88
|
+
- - ">="
|
89
|
+
- !ruby/object:Gem::Version
|
90
|
+
version: '2.0'
|
91
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
92
|
+
requirements:
|
93
|
+
- - ">="
|
94
|
+
- !ruby/object:Gem::Version
|
95
|
+
version: '0'
|
96
|
+
requirements: []
|
97
|
+
rubyforge_project:
|
98
|
+
rubygems_version: 2.4.8
|
99
|
+
signing_key:
|
100
|
+
specification_version: 4
|
101
|
+
summary: High performance 'find' alternative.
|
102
|
+
test_files: []
|