fast_find 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 9c5cd7c541e439cb4bd46acd6c28838e7867a369
4
+ data.tar.gz: 827db02601353683fd17828a779f71d5a282f80c
5
+ SHA512:
6
+ metadata.gz: dd2cde72e12d19e762e9a6830f73c12024bab29cecdb7f439501627f0a4c306aafc7130bd2e3ac6344ea7e0b20b4c57c22510f2859cdfd5a2c37dc7935d93913
7
+ data.tar.gz: af278abe2c42f33cb905e227c389a4a8c0d73aa5c4b30eb7f08b479159a21049e59ba1353776888c72ac1a7dbef1135931b7439060adc07b804f4fd5c8fbadec
data/.gitignore ADDED
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.travis.yml ADDED
@@ -0,0 +1,3 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2.2
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fastfind.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Thomas Hurst
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,87 @@
1
+ # FastFind
2
+
3
+ FastFind is a performance-oriented multi-threaded alternative to the standard
4
+ `Find` module that ships with Ruby. It should generally be a drop-in
5
+ replacement.
6
+
7
+ FastFind is expected to be marginally slower on MRI/YARV, since multithreaded
8
+ `File#lstat` calls there appear to serialize. However, using the FastFind-
9
+ specific second argument to pass in the `File::Stat` object for each file may
10
+ still prove a win.
11
+
12
+ This code is considered experimental. Beware of dog.
13
+
14
+ ## Installation
15
+
16
+ Add this line to your application's Gemfile:
17
+
18
+ gem 'fast_find', git: 'https://github.com/Freaky/fast_find.git'
19
+
20
+ And then execute:
21
+
22
+ $ bundle
23
+
24
+ ## Usage
25
+
26
+ Traditional Find-style:
27
+
28
+ FastFind.find(dir) {|entry| frob(entry) }
29
+ FastFind.find(dir, ignore_errors: false) { .. } # => explodes in your face
30
+ FastFind.find(dir) # => Enumerator
31
+
32
+ Extended style using the second argument to get a `File::Stat`, or `Exception`
33
+ object (if `ignore_errors` is false, this will be raised instead).
34
+
35
+ FastFind.find(dir) {|entry, stat| frob(entry, stat) }
36
+
37
+ For increased performance and better scaling behaviour, it is recommended to use
38
+ a single shared FastFind object. Multiple concurrent calls to
39
+ `FastFind::Finder#find` are safe and will share a persistant work pool.
40
+
41
+ Finder = FastFind::Finder.new
42
+ Finder.find(dir) { .. }
43
+
44
+ You can call `Finder#shutdown` to close the work pool if you're done with the
45
+ instance for the time being. Ensure no other calls to its `#find` are in flight
46
+ beforehand. The pool is restarted the next time `#find` is called.
47
+
48
+ Use the `concurrency` named argument to change the number of worker threads:
49
+
50
+ FastFind.find(dir, concurrency: 4)
51
+ FastFind::Finder.new(concurrency: 4)
52
+
53
+ Defaults are `8` for Rubinius and JRuby, `1` for anything else.
54
+
55
+ Note the yielded blocks are all executed in the parent thread, *not* in workers.
56
+
57
+ `FastFind#prune` works. So does `Find#prune`.
58
+
59
+ ## Performance
60
+
61
+ Scanning a cached copy of the NetBSD CVS repository:
62
+
63
+ jruby 9.0.1.0-SNAPSHOT (2.2.2) 2015-07-23 e88911e OpenJDK 64-Bit Server VM
64
+ 25.51-b03 on 1.8.0_51-b16 +jit [FreeBSD-amd64]:
65
+
66
+ user system total real
67
+ Find 32.890625 27.742188 60.632813 ( 47.518944)
68
+ FastFind 35.273438 41.742188 77.015625 ( 8.140893)
69
+
70
+ ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-freebsd10.1]:
71
+
72
+ user system total real
73
+ Find 10.187500 22.351562 32.539062 ( 32.545201)
74
+ FastFind 9.039062 14.226562 23.265625 ( 23.277589)
75
+
76
+ On MRI `Find` here is penalised because both `Find` and the benchmark code is
77
+ performing a `File#lstat`.
78
+
79
+ ## Development
80
+
81
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run
82
+ `bin/console` for an interactive prompt that will allow you to experiment.
83
+
84
+ To install this gem onto your local machine, run `bundle exec rake install`. To
85
+ release a new version, update the version number in `version.rb`, and then run
86
+ `bundle exec rake release` to create a git tag for the version, push git commits
87
+ and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new do |t|
5
+ t.libs.push 'test'
6
+ t.pattern = 'test/test_*.rb'
7
+ t.warning = true
8
+ t.verbose = true
9
+ end
data/bin/benchmark ADDED
@@ -0,0 +1,28 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'find'
4
+ require 'benchmark'
5
+
6
+ require "bundler/setup"
7
+ require 'fast_find'
8
+
9
+ FastFinder = FastFind::Finder.new
10
+
11
+ test_dirs = ARGV
12
+ abort("Usage: #{$0} [dir1 [dir2[ ..]]]") if test_dirs.empty?
13
+
14
+ Benchmark.bmbm do |b|
15
+ b.report("Find") do
16
+ files = Set.new
17
+ Find.find(*test_dirs) do |f|
18
+ files << [f, File.lstat(f)]
19
+ end
20
+ end
21
+
22
+ b.report("FastFind") do
23
+ files = Set.new
24
+ FastFinder.find(*test_dirs) do |f, stat|
25
+ files << [f, stat]
26
+ end
27
+ end
28
+ end
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "fast_find"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
data/fast_find.gemspec ADDED
@@ -0,0 +1,37 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'fast_find/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "fast_find"
8
+ spec.version = FastFind::VERSION
9
+ spec.authors = ["Thomas Hurst"]
10
+ spec.email = ["tom@hur.st"]
11
+
12
+ spec.summary = %q{High performance 'find' alternative.}
13
+ spec.description = %q{FastFind is a find workalike which optionally passes
14
+ in the File::Stat to the block, and can use multile
15
+ threads to walk directories concurrently.}
16
+ spec.homepage = "https://github.com/Freaky/fast_find"
17
+ spec.license = "MIT"
18
+
19
+ # Prevent pushing this gem to RubyGems.org by setting 'allowed_push_host', or
20
+ # delete this section to allow pushing this gem to any host.
21
+ if spec.respond_to?(:metadata)
22
+ spec.metadata['allowed_push_host'] = "https://rubygems.org"
23
+ else
24
+ raise "RubyGems 2.0 or newer is required to protect against public gem pushes."
25
+ end
26
+
27
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
28
+ spec.bindir = "exe"
29
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
30
+ spec.require_paths = ["lib"]
31
+
32
+ spec.required_ruby_version = ">= 2.0"
33
+
34
+ spec.add_development_dependency "bundler", "~> 1.9"
35
+ spec.add_development_dependency "rake", "~> 10.0"
36
+ spec.add_development_dependency "minitest"
37
+ end
data/lib/fast_find.rb ADDED
@@ -0,0 +1,158 @@
1
+ #
2
+ # fast_find.rb: A Find workalike optimized for performance.
3
+ #
4
+
5
+ require 'set'
6
+ require 'thread'
7
+ require 'fast_find/version'
8
+
9
+ module FastFind
10
+ DEFAULT_CONCURRENCY = %w(jruby rbx).include?(RUBY_ENGINE) ? 8 : 1
11
+
12
+ def self.find(*paths, concurrency: DEFAULT_CONCURRENCY, ignore_error: true,
13
+ &block)
14
+ Finder.new(concurrency: concurrency, one_shot: true)
15
+ .find(*paths, ignore_error: ignore_error, &block)
16
+ end
17
+
18
+ def self.prune
19
+ throw :prune
20
+ end
21
+
22
+ class Finder
23
+ def initialize(concurrency: DEFAULT_CONCURRENCY, one_shot: false)
24
+ @mutex = Mutex.new
25
+ @queue = Queue.new
26
+ @one_shot = one_shot
27
+ @concurrency = concurrency
28
+ @walkers = nil
29
+ end
30
+
31
+ def startup
32
+ @mutex.synchronize do
33
+ return if @walkers
34
+
35
+ @walkers = @concurrency.times.map { Walker.new.spawn(@queue) }
36
+ end
37
+ end
38
+
39
+ def shutdown
40
+ @mutex.synchronize do
41
+ return unless @walkers
42
+
43
+ @queue.clear
44
+ @walkers.each { @queue << nil }
45
+ @walkers.each(&:join)
46
+
47
+ @walkers = nil
48
+ end
49
+ end
50
+
51
+ def find(*paths, ignore_error: true, &block)
52
+ block or return enum_for(__method__, *paths, ignore_error: ignore_error)
53
+
54
+ results = Queue.new
55
+ pending = Set.new
56
+
57
+ startup
58
+
59
+ paths.map!(&:dup).each do |path|
60
+ path = path.to_path if path.respond_to? :to_path
61
+ results << [path, Util.safe_stat(path)]
62
+ end
63
+ results << [:initial, :finished]
64
+ pending << [:initial, :initial.encoding]
65
+
66
+ while result = results.deq
67
+ path, stat = result
68
+
69
+ if stat == :finished
70
+ pending.delete([path, path.encoding])
71
+
72
+ if pending.empty?
73
+ break
74
+ else
75
+ next
76
+ end
77
+ end
78
+
79
+ catch(:prune) do
80
+ yield_entry(result, block) if path.is_a? String
81
+
82
+ case stat
83
+ when Exception then raise stat unless ignore_error
84
+ when File::Stat
85
+ if stat.directory? and !pending.include?(pe = [path, path.encoding])
86
+ pending << pe
87
+ @queue << [path, results]
88
+ end
89
+ end
90
+ end
91
+ end
92
+ ensure
93
+ if one_shot?
94
+ @queue.clear
95
+ shutdown
96
+ end
97
+ end
98
+
99
+ private
100
+
101
+ def one_shot?() !!@one_shot end
102
+
103
+ def yield_entry(entry, block)
104
+ if block.arity == 2
105
+ block.call(entry[0].dup.taint, entry[1])
106
+ else
107
+ block.call entry[0].dup.taint
108
+ end
109
+ end
110
+ end
111
+
112
+ class Walker
113
+ def spawn(queue)
114
+ Thread.new do
115
+ @encoding = Encoding.find("filesystem")
116
+ while job = queue.deq
117
+ walk(job[0], job[1])
118
+ end
119
+ end
120
+ end
121
+
122
+ def walk(path, results)
123
+ enc = path.encoding == Encoding::US_ASCII ? @encoding : path.encoding
124
+
125
+ Dir.entries(path, encoding: enc).each do |entry|
126
+ next if entry == '.' or entry == '..'
127
+
128
+ stat(File.join(path, entry), results)
129
+ end
130
+ rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
131
+ Errno::ENAMETOOLONG => e
132
+ error(e, results)
133
+ ensure
134
+ finish(path, results)
135
+ end
136
+
137
+ def stat(entry, results)
138
+ results << [entry, Util.safe_stat(entry)]
139
+ end
140
+
141
+ def finish(path, results)
142
+ results << [path, :finished]
143
+ end
144
+
145
+ def error(e, results)
146
+ results << [:exception, e]
147
+ end
148
+ end
149
+
150
+ module Util
151
+ def self.safe_stat(path)
152
+ File.lstat(path)
153
+ rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP,
154
+ Errno::ENAMETOOLONG => e
155
+ e
156
+ end
157
+ end
158
+ end
@@ -0,0 +1,3 @@
1
+ module FastFind
2
+ VERSION = "0.1.1"
3
+ end
metadata ADDED
@@ -0,0 +1,102 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fast_find
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
5
+ platform: ruby
6
+ authors:
7
+ - Thomas Hurst
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2015-08-01 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '1.9'
19
+ name: bundler
20
+ prerelease: false
21
+ type: :development
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.9'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '10.0'
33
+ name: rake
34
+ prerelease: false
35
+ type: :development
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ name: minitest
48
+ prerelease: false
49
+ type: :development
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: |-
56
+ FastFind is a find workalike which optionally passes
57
+ in the File::Stat to the block, and can use multile
58
+ threads to walk directories concurrently.
59
+ email:
60
+ - tom@hur.st
61
+ executables: []
62
+ extensions: []
63
+ extra_rdoc_files: []
64
+ files:
65
+ - ".gitignore"
66
+ - ".travis.yml"
67
+ - Gemfile
68
+ - LICENSE.txt
69
+ - README.md
70
+ - Rakefile
71
+ - bin/benchmark
72
+ - bin/console
73
+ - bin/setup
74
+ - fast_find.gemspec
75
+ - lib/fast_find.rb
76
+ - lib/fast_find/version.rb
77
+ homepage: https://github.com/Freaky/fast_find
78
+ licenses:
79
+ - MIT
80
+ metadata:
81
+ allowed_push_host: https://rubygems.org
82
+ post_install_message:
83
+ rdoc_options: []
84
+ require_paths:
85
+ - lib
86
+ required_ruby_version: !ruby/object:Gem::Requirement
87
+ requirements:
88
+ - - ">="
89
+ - !ruby/object:Gem::Version
90
+ version: '2.0'
91
+ required_rubygems_version: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - ">="
94
+ - !ruby/object:Gem::Version
95
+ version: '0'
96
+ requirements: []
97
+ rubyforge_project:
98
+ rubygems_version: 2.4.8
99
+ signing_key:
100
+ specification_version: 4
101
+ summary: High performance 'find' alternative.
102
+ test_files: []