fast_find 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 9c5cd7c541e439cb4bd46acd6c28838e7867a369
4
+ data.tar.gz: 827db02601353683fd17828a779f71d5a282f80c
5
+ SHA512:
6
+ metadata.gz: dd2cde72e12d19e762e9a6830f73c12024bab29cecdb7f439501627f0a4c306aafc7130bd2e3ac6344ea7e0b20b4c57c22510f2859cdfd5a2c37dc7935d93913
7
+ data.tar.gz: af278abe2c42f33cb905e227c389a4a8c0d73aa5c4b30eb7f08b479159a21049e59ba1353776888c72ac1a7dbef1135931b7439060adc07b804f4fd5c8fbadec
data/.gitignore ADDED
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.travis.yml ADDED
@@ -0,0 +1,3 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2.2
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fastfind.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Thomas Hurst
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,87 @@
1
+ # FastFind
2
+
3
+ FastFind is a performance-oriented multi-threaded alternative to the standard
4
+ `Find` module that ships with Ruby. It should generally be a drop-in
5
+ replacement.
6
+
7
+ FastFind is expected to be marginally slower on MRI/YARV, since multithreaded
8
+ `File#lstat` calls there appear to serialize. However, using the FastFind-
9
+ specific second argument to pass in the `File::Stat` object for each file may
10
+ still prove a win.
11
+
12
+ This code is considered experimental. Beware of dog.
13
+
14
+ ## Installation
15
+
16
+ Add this line to your application's Gemfile:
17
+
18
+ gem 'fast_find', git: 'https://github.com/Freaky/fast_find.git'
19
+
20
+ And then execute:
21
+
22
+ $ bundle
23
+
24
+ ## Usage
25
+
26
+ Traditional Find-style:
27
+
28
+ FastFind.find(dir) {|entry| frob(entry) }
29
+ FastFind.find(dir, ignore_errors: false) { .. } # => explodes in your face
30
+ FastFind.find(dir) # => Enumerator
31
+
32
+ Extended style using the second argument to get a `File::Stat`, or `Exception`
33
+ object (if `ignore_errors` is false, this will be raised instead).
34
+
35
+ FastFind.find(dir) {|entry, stat| frob(entry, stat) }
36
+
37
+ For increased performance and better scaling behaviour, it is recommended to use
38
+ a single shared FastFind object. Multiple concurrent calls to
39
+ `FastFind::Finder#find` are safe and will share a persistant work pool.
40
+
41
+ Finder = FastFind::Finder.new
42
+ Finder.find(dir) { .. }
43
+
44
+ You can call `Finder#shutdown` to close the work pool if you're done with the
45
+ instance for the time being. Ensure no other calls to its `#find` are in flight
46
+ beforehand. The pool is restarted the next time `#find` is called.
47
+
48
+ Use the `concurrency` named argument to change the number of worker threads:
49
+
50
+ FastFind.find(dir, concurrency: 4)
51
+ FastFind::Finder.new(concurrency: 4)
52
+
53
+ Defaults are `8` for Rubinius and JRuby, `1` for anything else.
54
+
55
+ Note the yielded blocks are all executed in the parent thread, *not* in workers.
56
+
57
+ `FastFind#prune` works. So does `Find#prune`.
58
+
59
+ ## Performance
60
+
61
+ Scanning a cached copy of the NetBSD CVS repository:
62
+
63
+ jruby 9.0.1.0-SNAPSHOT (2.2.2) 2015-07-23 e88911e OpenJDK 64-Bit Server VM
64
+ 25.51-b03 on 1.8.0_51-b16 +jit [FreeBSD-amd64]:
65
+
66
+ user system total real
67
+ Find 32.890625 27.742188 60.632813 ( 47.518944)
68
+ FastFind 35.273438 41.742188 77.015625 ( 8.140893)
69
+
70
+ ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-freebsd10.1]:
71
+
72
+ user system total real
73
+ Find 10.187500 22.351562 32.539062 ( 32.545201)
74
+ FastFind 9.039062 14.226562 23.265625 ( 23.277589)
75
+
76
+ On MRI `Find` here is penalised because both `Find` and the benchmark code is
77
+ performing a `File#lstat`.
78
+
79
+ ## Development
80
+
81
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run
82
+ `bin/console` for an interactive prompt that will allow you to experiment.
83
+
84
+ To install this gem onto your local machine, run `bundle exec rake install`. To
85
+ release a new version, update the version number in `version.rb`, and then run
86
+ `bundle exec rake release` to create a git tag for the version, push git commits
87
+ and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new do |t|
5
+ t.libs.push 'test'
6
+ t.pattern = 'test/test_*.rb'
7
+ t.warning = true
8
+ t.verbose = true
9
+ end
data/bin/benchmark ADDED
@@ -0,0 +1,28 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'find'
4
+ require 'benchmark'
5
+
6
+ require "bundler/setup"
7
+ require 'fast_find'
8
+
9
+ FastFinder = FastFind::Finder.new
10
+
11
+ test_dirs = ARGV
12
+ abort("Usage: #{$0} [dir1 [dir2[ ..]]]") if test_dirs.empty?
13
+
14
+ Benchmark.bmbm do |b|
15
+ b.report("Find") do
16
+ files = Set.new
17
+ Find.find(*test_dirs) do |f|
18
+ files << [f, File.lstat(f)]
19
+ end
20
+ end
21
+
22
+ b.report("FastFind") do
23
+ files = Set.new
24
+ FastFinder.find(*test_dirs) do |f, stat|
25
+ files << [f, stat]
26
+ end
27
+ end
28
+ end
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "fast_find"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
data/fast_find.gemspec ADDED
@@ -0,0 +1,37 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'fast_find/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "fast_find"
8
+ spec.version = FastFind::VERSION
9
+ spec.authors = ["Thomas Hurst"]
10
+ spec.email = ["tom@hur.st"]
11
+
12
+ spec.summary = %q{High performance 'find' alternative.}
13
+ spec.description = %q{FastFind is a find workalike which optionally passes
14
+ in the File::Stat to the block, and can use multile
15
+ threads to walk directories concurrently.}
16
+ spec.homepage = "https://github.com/Freaky/fast_find"
17
+ spec.license = "MIT"
18
+
19
+ # Prevent pushing this gem to RubyGems.org by setting 'allowed_push_host', or
20
+ # delete this section to allow pushing this gem to any host.
21
+ if spec.respond_to?(:metadata)
22
+ spec.metadata['allowed_push_host'] = "https://rubygems.org"
23
+ else
24
+ raise "RubyGems 2.0 or newer is required to protect against public gem pushes."
25
+ end
26
+
27
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
28
+ spec.bindir = "exe"
29
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
30
+ spec.require_paths = ["lib"]
31
+
32
+ spec.required_ruby_version = ">= 2.0"
33
+
34
+ spec.add_development_dependency "bundler", "~> 1.9"
35
+ spec.add_development_dependency "rake", "~> 10.0"
36
+ spec.add_development_dependency "minitest"
37
+ end
data/lib/fast_find.rb ADDED
@@ -0,0 +1,158 @@
1
+ #
2
+ # fast_find.rb: A Find workalike optimized for performance.
3
+ #
4
+
5
+ require 'set'
6
+ require 'thread'
7
+ require 'fast_find/version'
8
+
9
+ module FastFind
10
+ DEFAULT_CONCURRENCY = %w(jruby rbx).include?(RUBY_ENGINE) ? 8 : 1
11
+
12
+ def self.find(*paths, concurrency: DEFAULT_CONCURRENCY, ignore_error: true,
13
+ &block)
14
+ Finder.new(concurrency: concurrency, one_shot: true)
15
+ .find(*paths, ignore_error: ignore_error, &block)
16
+ end
17
+
18
+ def self.prune
19
+ throw :prune
20
+ end
21
+
22
+ class Finder
23
+ def initialize(concurrency: DEFAULT_CONCURRENCY, one_shot: false)
24
+ @mutex = Mutex.new
25
+ @queue = Queue.new
26
+ @one_shot = one_shot
27
+ @concurrency = concurrency
28
+ @walkers = nil
29
+ end
30
+
31
+ def startup
32
+ @mutex.synchronize do
33
+ return if @walkers
34
+
35
+ @walkers = @concurrency.times.map { Walker.new.spawn(@queue) }
36
+ end
37
+ end
38
+
39
+ def shutdown
40
+ @mutex.synchronize do
41
+ return unless @walkers
42
+
43
+ @queue.clear
44
+ @walkers.each { @queue << nil }
45
+ @walkers.each(&:join)
46
+
47
+ @walkers = nil
48
+ end
49
+ end
50
+
51
+ def find(*paths, ignore_error: true, &block)
52
+ block or return enum_for(__method__, *paths, ignore_error: ignore_error)
53
+
54
+ results = Queue.new
55
+ pending = Set.new
56
+
57
+ startup
58
+
59
+ paths.map!(&:dup).each do |path|
60
+ path = path.to_path if path.respond_to? :to_path
61
+ results << [path, Util.safe_stat(path)]
62
+ end
63
+ results << [:initial, :finished]
64
+ pending << [:initial, :initial.encoding]
65
+
66
+ while result = results.deq
67
+ path, stat = result
68
+
69
+ if stat == :finished
70
+ pending.delete([path, path.encoding])
71
+
72
+ if pending.empty?
73
+ break
74
+ else
75
+ next
76
+ end
77
+ end
78
+
79
+ catch(:prune) do
80
+ yield_entry(result, block) if path.is_a? String
81
+
82
+ case stat
83
+ when Exception then raise stat unless ignore_error
84
+ when File::Stat
85
+ if stat.directory? and !pending.include?(pe = [path, path.encoding])
86
+ pending << pe
87
+ @queue << [path, results]
88
+ end
89
+ end
90
+ end
91
+ end
92
+ ensure
93
+ if one_shot?
94
+ @queue.clear
95
+ shutdown
96
+ end
97
+ end
98
+
99
+ private
100
+
101
+ def one_shot?() !!@one_shot end
102
+
103
+ def yield_entry(entry, block)
104
+ if block.arity == 2
105
+ block.call(entry[0].dup.taint, entry[1])
106
+ else
107
+ block.call entry[0].dup.taint
108
+ end
109
+ end
110
+ end
111
+
112
+ class Walker
113
+ def spawn(queue)
114
+ Thread.new do
115
+ @encoding = Encoding.find("filesystem")
116
+ while job = queue.deq
117
+ walk(job[0], job[1])
118
+ end
119
+ end
120
+ end
121
+
122
+ def walk(path, results)
123
+ enc = path.encoding == Encoding::US_ASCII ? @encoding : path.encoding
124
+
125
+ Dir.entries(path, encoding: enc).each do |entry|
126
+ next if entry == '.' or entry == '..'
127
+
128
+ stat(File.join(path, entry), results)
129
+ end
130
+ rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
131
+ Errno::ENAMETOOLONG => e
132
+ error(e, results)
133
+ ensure
134
+ finish(path, results)
135
+ end
136
+
137
+ def stat(entry, results)
138
+ results << [entry, Util.safe_stat(entry)]
139
+ end
140
+
141
+ def finish(path, results)
142
+ results << [path, :finished]
143
+ end
144
+
145
+ def error(e, results)
146
+ results << [:exception, e]
147
+ end
148
+ end
149
+
150
+ module Util
151
+ def self.safe_stat(path)
152
+ File.lstat(path)
153
+ rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
154
+ Errno::ENAMETOOLONG => e
155
+ e
156
+ end
157
+ end
158
+ end
@@ -0,0 +1,3 @@
1
+ module FastFind
2
+ VERSION = "0.1.1"
3
+ end
metadata ADDED
@@ -0,0 +1,102 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fast_find
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
5
+ platform: ruby
6
+ authors:
7
+ - Thomas Hurst
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2015-08-01 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '1.9'
19
+ name: bundler
20
+ prerelease: false
21
+ type: :development
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.9'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '10.0'
33
+ name: rake
34
+ prerelease: false
35
+ type: :development
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ name: minitest
48
+ prerelease: false
49
+ type: :development
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: |-
56
+ FastFind is a find workalike which optionally passes
57
+ in the File::Stat to the block, and can use multile
58
+ threads to walk directories concurrently.
59
+ email:
60
+ - tom@hur.st
61
+ executables: []
62
+ extensions: []
63
+ extra_rdoc_files: []
64
+ files:
65
+ - ".gitignore"
66
+ - ".travis.yml"
67
+ - Gemfile
68
+ - LICENSE.txt
69
+ - README.md
70
+ - Rakefile
71
+ - bin/benchmark
72
+ - bin/console
73
+ - bin/setup
74
+ - fast_find.gemspec
75
+ - lib/fast_find.rb
76
+ - lib/fast_find/version.rb
77
+ homepage: https://github.com/Freaky/fast_find
78
+ licenses:
79
+ - MIT
80
+ metadata:
81
+ allowed_push_host: https://rubygems.org
82
+ post_install_message:
83
+ rdoc_options: []
84
+ require_paths:
85
+ - lib
86
+ required_ruby_version: !ruby/object:Gem::Requirement
87
+ requirements:
88
+ - - ">="
89
+ - !ruby/object:Gem::Version
90
+ version: '2.0'
91
+ required_rubygems_version: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - ">="
94
+ - !ruby/object:Gem::Version
95
+ version: '0'
96
+ requirements: []
97
+ rubyforge_project:
98
+ rubygems_version: 2.4.8
99
+ signing_key:
100
+ specification_version: 4
101
+ summary: High performance 'find' alternative.
102
+ test_files: []