findler 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.travis.yml ADDED
@@ -0,0 +1,6 @@
1
+ language: ruby
2
+ script: bundle exec rake
3
+ rvm:
4
+ - 1.8.7
5
+ - 1.9.3
6
+ - rbx-18mode
data/Gemfile CHANGED
@@ -1,6 +1,3 @@
1
1
  source "http://rubygems.org"
2
2
  gemspec
3
3
 
4
- gem "rake"
5
- gem "yard"
6
- gem "rspec", '~> 2.7.0'
data/README.md CHANGED
@@ -1,5 +1,7 @@
1
1
  # Findler: Filesystem Iteration with Persistable State
2
2
 
3
+ [![Build Status](https://secure.travis-ci.org/mceachen/findler.png?branch=master)](http://travis-ci.org/mceachen/findler)
4
+
3
5
  Findler is a Ruby library for iterating over a filtered set of files from a given
4
6
  path, written to be suitable with concurrent workers and very large
5
7
  filesystem hierarchies.
@@ -8,9 +10,9 @@ filesystem hierarchies.
8
10
 
9
11
  ```ruby
10
12
  f = Findler.new "/Users/mrm"
11
- f.append_extension ".jpg", ".jpeg"
13
+ f.add_extensions ".jpg", ".jpeg"
12
14
  iterator = f.iterator
13
- iterator.next
15
+ iterator.next_file
14
16
  # => "/Users/mrm/Photos/img_1000.jpg"
15
17
  ```
16
18
 
@@ -31,7 +33,7 @@ To resume iteration:
31
33
 
32
34
  ```ruby
33
35
  Marshal.load(IO.open('iterator.state'))
34
- iterator.next
36
+ iterator.next_file
35
37
  # => "/Users/mrm/Photos/img_1001.jpg"
36
38
  ```
37
39
 
@@ -39,13 +41,89 @@ To re-check a directory hierarchy for files that you haven't visited yet:
39
41
 
40
42
  ```ruby
41
43
  iterator.rescan!
42
- iterator.next
44
+ iterator.next_file
43
45
  # => "/Users/mrm/Photos/img_1002.jpg"
44
46
  ```
45
47
 
48
+ External synchronization between the serialized state of the
49
+ iterator and the other processes will have to be done by you, of course.
50
+ The ```load```, ```next_file``` , and ```dump``` should be done while holding
51
+ an iteration mutex of some sort.
52
+
53
+ ## Filtering and ordering
54
+
55
+ Filters provide custom exclusion and ordering criteria, so you don't
56
+ have to do that logic in the code that consumes from your iterator.
57
+
58
+ Filters can't be procs or lambdas because those aren't safely serializable.
59
+
60
+ What you provide to ```add_filter``` is a symbolized name of a class method
61
+ on ```Findler::Filters```:
62
+
63
+ ```ruby
64
+ f = Findler.new(".")
65
+ f.add_filter :order_by_name
66
+ ```
67
+
68
+ Note that the last filter added will be last to order the children, so it will be the
69
+ "primary" sort criterion. Note also that the ordering is only done in
70
+ the context of a given directory.
71
+
72
+ ### Implementing your own filter
73
+
74
+ Filter methods receive an array of ```Pathname``` instances. Those pathnames will:
75
+
76
+ 1. have the same parent
77
+ 2. will not have been enumerated by ```next_file()``` already
78
+ 3. will satisfy the settings given to the parent Findler instance, like ```include_hidden```
79
+ and added patterns.
80
+
81
+ Note that the last filter added will be last to order the children, so it will be the
82
+ "primary" sort criterion.
83
+
84
+ The returned values from the class method will be the final set of elements (both files
85
+ and directories) that Findler will return from ```next_file()```.
86
+
87
+ ### Example
88
+
89
+ To find files that have valid EXIF headers, using the *most* excellent
90
+ [exiftoolr](https://github.com/mceachen/exiftoolr) gem, you'd do this:
91
+
92
+ ```ruby
93
+ require 'findler'
94
+ require 'exiftoolr'
95
+
96
+ # Monkey-patch Filters to add our custom filter:
97
+ class Findler::Filters
98
+ def self.exif_only(children)
99
+ child_files = children.select{|ea|ea.file?}
100
+ child_dirs = children.select{|ea|ea.directory?}
101
+ e = Exiftoolr.new(child_files)
102
+ e.files_with_results + child_dirs
103
+ end
104
+ end
105
+
106
+ f = Findler.new "/Users/mrm"
107
+ f.add_extensions ".jpg", ".jpeg", ".cr2", ".nef"
108
+ f.case_insensitive!
109
+ f.add_filter(:exif_only)
110
+ ```
111
+
112
+ ### Filter implementation notes
113
+
114
+ * The array of ```Pathname``` instances can be assumed to be absolute.
115
+ * Only child files that satisfy the ```extension``` and ```pattern``` filters will be seen by the filter class method.
116
+ * If a directory doesn't have any relevant files, the filter method will be called multiple times for a given call to ```next_file()```.
117
+ * if you want to be notified when new directories are walked into, and you want to do a bulk operation within that directory,
118
+ this gives you that hook–-just remember to return the children array at the end of your block.
119
+
120
+ ### Why can't ```filter_with``` be a proc?
121
+
122
+ Because procs and lambdas aren't Marshal-able, and I didn't want to use something scary like ruby2ruby and eval.
46
123
 
47
124
  ## Changelog
48
125
 
49
- * 0.0.1 First `find`
126
+ * 0.0.4 Added custom filters for ```next_file()``` and singular aliases for ```add_extension``` and ```add_pattern```
127
+ * 0.0.3 Fixed gemfile packaging
50
128
  * 0.0.2 Added scalable Bloom filter so ```Iterator#rescan``` is possible
51
- * 0.0.3 Fixed gemfile packaging
129
+ * 0.0.1 First `find`
data/Rakefile CHANGED
@@ -4,7 +4,13 @@ YARD::Rake::YardocTask.new do |t|
4
4
  t.files = ['lib/**/*.rb', 'README.md']
5
5
  end
6
6
 
7
- require "rspec/core/rake_task"
8
- RSpec::Core::RakeTask.new(:spec)
7
+ require 'rake/testtask'
9
8
 
10
- task :default => :spec
9
+ Rake::TestTask.new do |t|
10
+ t.libs.push "lib"
11
+ t.libs.push "test"
12
+ t.pattern = 'test/**/*_test.rb'
13
+ t.verbose = true
14
+ end
15
+
16
+ task :default => :test
@@ -0,0 +1,27 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'rubygems'
4
+ require 'exiftoolr'
5
+ require 'findler'
6
+
7
+ class Findler::Filters
8
+ def self.with_exif(children)
9
+ child_files = children.select { |ea| ea.file? }
10
+ child_dirs = children.select { |ea| ea.directory? }
11
+ e = Exiftoolr.new(child_files)
12
+ good = e.files_with_results
13
+ bad = child_files - good
14
+ puts "Files missing EXIF:\n #{(bad).join("\n ")}" unless bad.empty?
15
+ good + child_dirs
16
+ end
17
+ end
18
+
19
+ f = Findler.new(ENV['HOME'])
20
+ f.add_extensions ".jpg", ".jpeg", ".cr2", ".nef"
21
+ f.add_filter :with_exif
22
+ f.case_insensitive!
23
+ iter = f.iterator
24
+
25
+ while nxt = iter.next_file
26
+ puts "next file: #{nxt}"
27
+ end
data/findler.gemspec CHANGED
@@ -16,11 +16,12 @@ Gem::Specification.new do |s|
16
16
  s.rubyforge_project = "findler"
17
17
 
18
18
  s.files = `git ls-files`.split("\n")
19
- s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.test_files = `git ls-files -- {test,test,features}/*`.split("\n")
20
20
  s.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
21
21
  s.require_paths = ["lib"]
22
22
  s.add_development_dependency "rake"
23
23
  s.add_development_dependency "yard"
24
- s.add_development_dependency "rspec", "~> 2.7.0"
24
+ s.add_development_dependency "minitest"
25
+ s.add_development_dependency "minitest-reporters"
25
26
  s.add_dependency "bloomer"
26
27
  end
data/lib/findler.rb CHANGED
@@ -1,25 +1,46 @@
1
1
  class Findler
2
- IGNORE_CASE = 1
3
- INCLUDE_HIDDEN = 2
4
2
 
5
3
  autoload :Iterator, "findler/iterator"
4
+ autoload :Error, "findler/error"
5
+ require "findler/filters"
6
6
 
7
- def initialize path
7
+ IGNORE_CASE = 1
8
+ INCLUDE_HIDDEN = 2
9
+
10
+ def initialize(path)
8
11
  @path = path
9
12
  @flags = 0
10
13
  end
11
14
 
12
- # These are File.fnmatch patterns. If any pattern matches, it will be returned by Iterator#next.
15
+ # These are File.fnmatch patterns.
16
+ # If any pattern matches, it will be returned by Iterator#next.
13
17
  # (see File.fnmatch?)
14
- def add_pattern *patterns
15
- patterns.each { |ea| (@patterns ||= []) << ea }
18
+ def patterns
19
+ @patterns ||= []
20
+ end
21
+
22
+ def add_patterns(*patterns)
23
+ self.patterns += patterns
16
24
  end
17
25
 
18
- def append_extension *extensions
19
- extensions.each { |ea| add_pattern "*#{normalize_extension(ea)}" }
26
+ def add_pattern(pattern)
27
+ self.patterns << pattern
28
+ end
29
+
30
+ def add_extension(extension)
31
+ add_pattern "*#{normalize_extension(extension)}"
32
+ end
33
+
34
+ def add_extensions(*extensions)
35
+ extensions.each { |ea| add_extension(ea) }
36
+ end
37
+
38
+ # Should patterns be interpreted in a case-sensitive manner? The default is case sensitive,
39
+ # but if your local filesystem is not case sensitive, this flag is a no-op.
40
+ def case_sensitive!
41
+ @flags &= ~IGNORE_CASE
20
42
  end
21
43
 
22
- # Should patterns be interpreted in a case-insensitive manor? (default is case sensitive)
23
44
  def case_insensitive!
24
45
  @flags |= IGNORE_CASE
25
46
  end
@@ -30,18 +51,60 @@ class Findler
30
51
  @flags |= INCLUDE_HIDDEN
31
52
  end
32
53
 
54
+ def exclude_hidden!
55
+ @flags &= ~INCLUDE_HIDDEN
56
+ end
57
+
58
+ def filter_class
59
+ (@filter_class ||= Filters)
60
+ end
61
+
62
+ def filter_class=(new_filter_class)
63
+ raise Error unless new_filter_class.is_a? Class
64
+ filters.each{|ea|new_filter_class.method(ea)} # verify the filters class has those methods defined
65
+ @filter_class = new_filter_class
66
+ end
67
+
68
+ # Accepts symbols whose names are class methods on Finder::Filters.
69
+ #
70
+ # Filter methods receive an array of Pathname instances, and are in charge of ordering
71
+ # and filtering the array. The returned array of pathnames will be used by the iterator.
72
+ #
73
+ # Those pathnames will:
74
+ # a) have the same parent
75
+ # b) will not have been enumerated by next() already
76
+ # c) will satisfy the hidden flag and patterns preferences
77
+ #
78
+ # Note that the last filter added will be last to order the children, so it will be the
79
+ # "primary" sort criterion.
80
+ def add_filter(filter_symbol)
81
+ filter_class.method(filter_symbol)
82
+ filters << filter_symbol
83
+ end
84
+
85
+ def filters
86
+ (@filters ||= [])
87
+ end
88
+
89
+ def add_filters(*filter_symbols)
90
+ filter_symbols.each { |ea| add_filter(ea) }
91
+ end
92
+
33
93
  def iterator
34
- Iterator.new(:path => @path, :patterns => @patterns, :flags => @flags)
94
+ Iterator.new(:path => @path,
95
+ :patterns => @patterns,
96
+ :flags => @flags,
97
+ :filters => @filters)
35
98
  end
36
99
 
37
100
  private
38
101
 
39
- def normalize_extension extension
102
+ def normalize_extension(extension)
40
103
  if extension.nil? || extension.empty? || extension.start_with?(".")
41
104
  extension
42
105
  else
43
106
  ".#{extension}"
44
107
  end
45
108
  end
46
-
47
109
  end
110
+
@@ -0,0 +1,2 @@
1
+ class Findler::Error < StandardError
2
+ end
@@ -0,0 +1,38 @@
1
+ class Findler::Filters
2
+ # files first, then directories
3
+ def self.files_first(paths)
4
+ preserve_sort_by(paths) { |ea| ea.file? ? -1 : 1 }
5
+ end
6
+
7
+ # directories first, then files
8
+ def self.directories_first(paths)
9
+ preserve_sort_by(paths) { |ea| ea.directory? ? -1 : 1 }
10
+ end
11
+
12
+ # order by the mtime of each file. Oldest files first.
13
+ def self.order_by_mtime_asc(paths)
14
+ preserve_sort_by(paths) { |ea| ea.mtime }
15
+ end
16
+
17
+ # reverse order by the mtime of each file. Newest files first.
18
+ def self.order_by_mtime_desc(paths)
19
+ preserve_sort_by(paths) { |ea| -ea.mtime }
20
+ end
21
+
22
+ # order by the name of each file.
23
+ def self.order_by_name(paths)
24
+ preserve_sort_by(paths) { |ea| ea.basename.to_s }
25
+ end
26
+
27
+ # reverse the order of the sort
28
+ def self.reverse(paths)
29
+ paths.reverse
30
+ end
31
+
32
+ def self.preserve_sort_by(array, &block)
33
+ ea_to_index = Hash[array.zip((0..array.size-1).to_a)]
34
+ array.sort_by do |ea|
35
+ [yield(ea), ea_to_index[ea]]
36
+ end
37
+ end
38
+ end
@@ -1,56 +1,50 @@
1
1
  require 'bloomer'
2
+ require 'pathname'
2
3
 
3
4
  class Findler
5
+
4
6
  class Iterator
5
7
 
6
- attr_reader :path, :parent, :patterns, :flags, :visited_dirs, :visited_files
8
+ attr_reader :path, :parent, :patterns, :flags, :filters_class, :filters, :visited_dirs, :visited_files
7
9
 
8
10
  def initialize(attrs, parent = nil)
9
11
  @path = attrs[:path]
10
12
  @path = Pathname.new(@path) unless @path.is_a? Pathname
13
+ @path = @path.expand_path unless @path.absolute?
11
14
  @parent = parent
12
15
 
13
- set_ivar(:visited_dirs, attrs) { Bloomer::Scalable.new(256, 1.0/1_000_000) }
14
- set_ivar(:visited_files, attrs) { Bloomer::Scalable.new(256, 1.0/1_000_000) }
15
- set_ivar(:patterns, attrs) { nil }
16
- set_ivar(:flags, attrs) { 0 }
16
+ set_inheritable_ivar(:visited_dirs, attrs) { self.class.new_presence_collection }
17
+ set_inheritable_ivar(:visited_files, attrs) { self.class.new_presence_collection }
18
+ set_inheritable_ivar(:patterns, attrs) { nil }
19
+ set_inheritable_ivar(:flags, attrs) { 0 }
20
+ set_inheritable_ivar(:filters, attrs) { [] }
21
+ set_inheritable_ivar(:filters_class, attrs) { Filters }
22
+ set_inheritable_ivar(:sort_with, attrs) { nil }
17
23
 
18
24
  @sub_iter = self.class.new(attrs[:sub_iter], self) if attrs[:sub_iter]
19
25
  end
20
26
 
21
27
  # Visit this directory and all sub directories, and check for unseen files. Only call on the root iterator.
22
28
  def rescan!
23
- raise "Only invoke on root" unless @parent.nil?
24
- @visited_dirs = Bloomer::Scalable.new(256, 1.0/1_000_000)
29
+ raise Error, "Only invoke on root" unless @parent.nil?
30
+ @visited_dirs = self.class.new_presence_collection
25
31
  @children = nil
26
32
  @sub_iter = nil
27
33
  end
28
34
 
29
- #def to_hash
30
- # {:path => @path, :visited_dirs:patterns => @patterns, :flags => @flags, :sub_iter => @sub_iter && @sub_iter.to_hash}
31
- #end
32
- #
33
- #def _dump(depth)
34
- # Marshal.dump(to_hash)
35
- #end
36
- #
37
- #def self._load(data)
38
- # new(Marshal.load(data))
39
- #end
40
-
41
- def case_insensitive?
42
- (Findler::IGNORE_CASE | flags) != 0
35
+ def ignore_case?
36
+ (Findler::IGNORE_CASE & flags) > 0
43
37
  end
44
38
 
45
- def skip_hidden?
46
- (Findler::INCLUDE_HIDDEN | flags) == 0
39
+ def include_hidden?
40
+ (Findler::INCLUDE_HIDDEN & flags) > 0
47
41
  end
48
42
 
49
43
  def fnmatch_flags
50
- @_fnflags ||= (@parent && @parent.fnmatch_flags) || begin
44
+ @fnmatch_flags ||= (@parent && @parent.fnmatch_flags) || begin
51
45
  f = 0
52
- f |= File::FNM_CASEFOLD if case_insensitive?
53
- f |= File::FNM_DOTMATCH if !skip_hidden?
46
+ f |= File::FNM_CASEFOLD if ignore_case?
47
+ f |= File::FNM_DOTMATCH if include_hidden?
54
48
  f
55
49
  end
56
50
  end
@@ -59,11 +53,11 @@ class Findler
59
53
  @path
60
54
  end
61
55
 
62
- def next
56
+ def next_file
63
57
  return nil unless @path.exist?
64
-
58
+
65
59
  if @sub_iter
66
- nxt = @sub_iter.next
60
+ nxt = @sub_iter.next_file
67
61
  return nxt unless nxt.nil?
68
62
  @visited_dirs.add @sub_iter.path.to_s
69
63
  @sub_iter = nil
@@ -71,10 +65,13 @@ class Findler
71
65
 
72
66
  # If someone touches the directory while we iterate, redo the @children.
73
67
  @children = nil if @path.ctime != @ctime || @path.mtime != @mtime
74
- @children ||= begin
68
+
69
+ if @children.nil?
75
70
  @mtime = @path.mtime
76
71
  @ctime = @path.ctime
77
- @path.children.delete_if { |ea| skip?(ea) }
72
+ children = @path.children.delete_if { |ea| skip?(ea) }
73
+ filtered_children = @filters.inject(children){ |c, f| filter(c, f) }
74
+ @children = filtered_children
78
75
  end
79
76
 
80
77
  nxt = @children.shift
@@ -82,7 +79,7 @@ class Findler
82
79
 
83
80
  if nxt.directory?
84
81
  @sub_iter = Iterator.new({:path => nxt}, self)
85
- self.next
82
+ self.next_file
86
83
  else
87
84
  @visited_files.add nxt.to_s
88
85
  nxt
@@ -91,12 +88,30 @@ class Findler
91
88
 
92
89
  private
93
90
 
94
- def set_ivar(field, attrs, &block)
95
- sym = "@#{field}".to_sym
96
- v = attrs[field]
97
- v ||= begin
98
- (p = instance_variable_get(:@parent)) && p.instance_variable_get(sym)
91
+ def self.new_presence_collection
92
+ Bloomer::Scalable.new(256, 1.0/1_000_000)
93
+ end
94
+
95
+ def filter(children, filter_symbol)
96
+ filtered_children = filters_class.send(filter_symbol, children)
97
+ unless filtered_children.respond_to? :collect
98
+ raise Error, "#{path.to_s}: filter_with, must return an Enumerable"
99
+ end
100
+ children_as_pathnames = filtered_children.collect { |ea| ea.is_a?(Pathname) ? ea : Pathname.new(ea) }
101
+ illegal_children = children_as_pathnames - children
102
+ unless illegal_children.empty?
103
+ raise Error, "#{path.to_s}: filter_with returned unexpected paths: #{illegal_children.join(",")}"
99
104
  end
105
+ children_as_pathnames
106
+ end
107
+
108
+ # Sets the instance variable to the value in attrs[field].
109
+ # If attrs is missing a value, pull the value from the parent.
110
+ # If the parent doesn't have a value, use the block to generate a default.
111
+ def set_inheritable_ivar(field, attrs, &block)
112
+ v = attrs[field]
113
+ sym = "@#{field}".to_sym
114
+ v ||= parent.instance_variable_get(sym)
100
115
  v ||= yield
101
116
  instance_variable_set(sym, v)
102
117
  end
@@ -107,13 +122,13 @@ class Findler
107
122
 
108
123
  def skip? pathname
109
124
  s = pathname.to_s
110
- return true if hidden?(pathname) && skip_hidden?
111
- return @visited_dirs.include?(s) if pathname.directory?
112
- return true if @visited_files.include?(s)
125
+ return true if !include_hidden? && hidden?(pathname)
126
+ return visited_dirs.include?(s) if pathname.directory?
127
+ return true if visited_files.include?(s)
113
128
  unless patterns.nil?
114
129
  return true if patterns.none? { |p| pathname.fnmatch(p, fnmatch_flags) }
115
130
  end
116
131
  return false
117
132
  end
118
133
  end
119
- end
134
+ end
@@ -1,3 +1,3 @@
1
1
  class Findler
2
- VERSION = "0.0.3"
2
+ VERSION = "0.0.4"
3
3
  end
@@ -0,0 +1,15 @@
1
+ require 'test_helper'
2
+
3
+ describe Findler::Filters do
4
+ it "should preserve sort order with no result" do
5
+ a = %w(h a p p y)
6
+ b = Findler::Filters.preserve_sort_by(a) { 0 }
7
+ b.must_equal a
8
+ end
9
+
10
+ it "should preserve sort order" do
11
+ a = [3.5, 3.1, 1.1, 2.0, 1.5, 1.3]
12
+ b = Findler::Filters.preserve_sort_by(a) { |ea| ea.to_i }
13
+ b.must_equal [1.1, 1.5, 1.3, 2.0, 3.5, 3.1]
14
+ end
15
+ end
@@ -0,0 +1,224 @@
1
+ require 'test_helper'
2
+
3
+ class Findler::Filters
4
+ def self.non_empty_files(children)
5
+ (children.select { |ea| ea.directory? || ea.size > 0 })
6
+ end
7
+
8
+ def self.no_return(children)
9
+ end
10
+
11
+ def self.invalid_return(children)
12
+ "/invalid/file"
13
+ end
14
+
15
+ def self.files_to_s(children)
16
+ (children).collect { |ea| ea.to_s }
17
+ end
18
+ end
19
+
20
+ describe Findler do
21
+
22
+ def touch_secrets
23
+ `mkdir .hide ; touch .outer-hide dir-0/.hide .hide/normal.txt .hide/.secret`
24
+ end
25
+
26
+ it "should detect hidden files properly" do
27
+ i = Findler::Iterator.new(:path => "/tmp")
28
+ i.send(:hidden?, Pathname.new("/a/b")).must_equal false
29
+ i.send(:hidden?, Pathname.new("/a/.b")).must_equal true
30
+ i.send(:hidden?, Pathname.new("/.a/.b")).must_equal true
31
+ i.send(:hidden?, Pathname.new("/.a/b")).must_equal false
32
+ end
33
+
34
+ it "should skip hidden files by default" do
35
+ i = Findler.new("/tmp").iterator
36
+ i.send(:skip?, Pathname.new("/tmp/not-hidden")).must_equal false
37
+ i.send(:skip?, Pathname.new("/tmp/.hidden")).must_equal true
38
+ end
39
+
40
+ it "should not skip hidden files when set" do
41
+ f = Findler.new("/tmp")
42
+ f.include_hidden!
43
+ i = f.iterator
44
+ i.send(:skip?, Pathname.new("/tmp/not-hidden")).must_equal false
45
+ i.send(:skip?, Pathname.new("/tmp/.hidden")).must_equal false
46
+ end
47
+
48
+ it "should find all non-hidden files by default" do
49
+ with_tree(%W(.jpg .txt)) do |dir|
50
+ touch_secrets
51
+ f = Findler.new(dir)
52
+ collect_files(f.iterator).sort.must_equal `find * -type f -not -name '.*'`.split.sort
53
+ f.exclude_hidden! # should be a no-op
54
+ collect_files(f.iterator).sort.must_equal `find * -type f -not -name '.*'`.split.sort
55
+ f.include_hidden!
56
+ collect_files(f.iterator).sort.must_equal `find . -type f | sed -e 's/^\\.\\///'`.split.sort
57
+ end
58
+ end
59
+
60
+ it "should find only .jpg files when constrained" do
61
+ with_tree(%W(.jpg .txt .JPG)) do |dir|
62
+ f = Findler.new(dir)
63
+ f.add_extension ".jpg"
64
+ if fs_case_sensitive?
65
+ f.case_sensitive!
66
+ collect_files(f.iterator).sort.must_equal `find * -type f -name \\*.jpg`.split.sort
67
+ end
68
+ f.case_insensitive!
69
+ collect_files(f.iterator).sort.must_equal `find * -type f -iname \\*.jpg`.split.sort
70
+ end
71
+ end
72
+
73
+ it "should find .jpg or .JPG files when constrained" do
74
+ with_tree(%W(.jpg .txt .JPG)) do |dir|
75
+ f = Findler.new(dir)
76
+ f.add_extension ".jpg"
77
+ f.case_insensitive!
78
+ iter = f.iterator
79
+ collect_files(iter).sort.must_equal `find * -type f -iname \\*.jpg`.split.sort
80
+ end
81
+ end
82
+
83
+ it "should find files added after iteration started" do
84
+ with_tree(%W(.txt)) do |dir|
85
+ f = Findler.new(dir)
86
+ iter = f.iterator
87
+ iter.next_file.wont_be_nil
88
+
89
+ # cheating with mtime on the touch doesn't properly update the parent directory ctime,
90
+ # so we have to deal with the second-granularity resolution of the filesystem.
91
+ sleep(1.1)
92
+
93
+ FileUtils.touch(dir + "new.txt")
94
+ collect_files(iter).must_include("new.txt")
95
+ end
96
+ end
97
+
98
+ it "should find new files after a rescan" do
99
+ with_tree([".txt", ".no"]) do |dir|
100
+ f = Findler.new(dir)
101
+ f.add_extension ".txt"
102
+ iter = f.iterator
103
+ collect_files(iter).sort.must_equal `find * -type f -iname \\*.txt`.split.sort
104
+ FileUtils.touch(dir + "dir-0" + "dir-1" + "new-0.txt")
105
+ FileUtils.touch(dir + "dir-1" + "dir-0" + "new-1.txt")
106
+ FileUtils.touch(dir + "dir-2" + "dir-2" + "new-2.txt")
107
+ collect_files(iter).must_be_empty
108
+ iter.rescan!
109
+ collect_files(iter).sort.must_equal ["dir-0/dir-1/new-0.txt", "dir-1/dir-0/new-1.txt", "dir-2/dir-2/new-2.txt"]
110
+ end
111
+ end
112
+
113
+ it "should not return files removed after iteration started" do
114
+ with_tree([".txt"]) do |dir|
115
+ f = Findler.new(dir)
116
+ iter = f.iterator
117
+ iter.next_file.wont_be_nil
118
+ sleep(1.1) # see above for hand-wringing-defense of this atrocity
119
+
120
+ (dir + "tmp-1.txt").unlink
121
+ collect_files(iter).wont_include("tmp-1.txt")
122
+ end
123
+ end
124
+
125
+ it "should dump/load in the middle of iterating" do
126
+ with_tree(%W(.jpg .txt .JPG)) do |dir|
127
+ all_files = `find * -type f -iname \\*.jpg`.split
128
+ all_files.size.times do |i|
129
+ f = Findler.new(dir)
130
+ f.add_extension ".jpg"
131
+ f.case_insensitive!
132
+ iter_a = f.iterator
133
+ files_a = i.times.collect { relative_path(iter_a.path, iter_a.next_file) }
134
+ iter_b = Marshal.load(Marshal.dump(iter_a))
135
+ files_b = collect_files(iter_b)
136
+
137
+ iter_c = Marshal.load(Marshal.dump(iter_b))
138
+ collect_files(iter_c)
139
+ iter_c.next_file.must_be_nil
140
+
141
+ (files_a + files_b).sort.must_equal all_files.sort
142
+ end
143
+ end
144
+ end
145
+
146
+ it "should create an iterator even for a non-existent directory" do
147
+ tmpdir = nil
148
+ Dir.mktmpdir do |dir|
149
+ tmpdir = Pathname.new dir
150
+ end
151
+ tmpdir.exist?.must_equal false
152
+ f = Findler.new(tmpdir)
153
+ collect_files(f.iterator).must_be_empty
154
+ end
155
+
156
+ it "should raise an error if the block given to next_file returns nil" do
157
+ Dir.mktmpdir do |dir|
158
+ f = Findler.new(dir)
159
+ f.add_filter :no_return
160
+ i = f.iterator
161
+ lambda { i.next_file }.must_raise(Findler::Error)
162
+ end
163
+ end
164
+
165
+ it "should raise an error if the block returns non-children" do
166
+ with_tree(%W(.txt)) do |dir|
167
+ f = Findler.new(dir)
168
+ f.add_filter :invalid_return
169
+ i = f.iterator
170
+ lambda { i.next_file }.must_raise(Findler::Error)
171
+ end
172
+ end
173
+
174
+ it "should support filter_with against global/Kernel methods" do
175
+ with_tree(%W(.txt)) do |dir|
176
+ f = Findler.new(dir)
177
+ f.add_filter :files_to_s
178
+ iter = f.iterator
179
+ files = collect_files(iter)
180
+ files.sort.must_equal `find * -type f`.split.sort
181
+ end
182
+ end
183
+
184
+ it "should support next_file blocks properly" do
185
+ with_tree(%W(.a .b)) do |dir|
186
+ Dir["**/*.a"].each { |ea| File.open(ea, 'w') { |f| f.write("hello") } }
187
+ f = Findler.new(dir)
188
+ f.add_filter :non_empty_files
189
+ iter = f.iterator
190
+ files = collect_files(iter)
191
+ files.sort.must_equal `find * -type f -name \\*.a`.split.sort
192
+ end
193
+ end
194
+
195
+ it "should support files_first ordering" do
196
+ with_tree(%W(.a), {
197
+ :depth => 2,
198
+ :files_per_dir => 2,
199
+ :subdirs_per_dir => 1,
200
+ }) do |dir|
201
+ f = Findler.new(dir)
202
+ f.add_filters :order_by_name, :files_first
203
+ expected = %W(tmp-0.a tmp-1.a dir-0/tmp-0.a dir-0/tmp-1.a)
204
+ collect_files(f.iterator).must_equal expected
205
+ f.add_filter :reverse
206
+ collect_files(f.iterator).must_equal expected.reverse
207
+ end
208
+ end
209
+
210
+ it "should support directory_first ordering" do
211
+ with_tree(%W(.a), {
212
+ :depth => 2,
213
+ :files_per_dir => 2,
214
+ :subdirs_per_dir => 1,
215
+ }) do |dir|
216
+ f = Findler.new(dir)
217
+ f.add_filters :order_by_name, :directories_first
218
+ expected = %W(dir-0/tmp-0.a dir-0/tmp-1.a tmp-0.a tmp-1.a)
219
+ collect_files(f.iterator).must_equal expected
220
+ f.add_filter :reverse
221
+ collect_files(f.iterator).must_equal expected.reverse
222
+ end
223
+ end
224
+ end
@@ -0,0 +1,83 @@
1
+ require 'minitest/spec'
2
+ require 'minitest/reporters'
3
+ require 'minitest/autorun'
4
+ require 'tmpdir'
5
+ require 'fileutils'
6
+ require 'findler'
7
+
8
+ MiniTest::Unit.runner = MiniTest::SuiteRunner.new
9
+ if ENV["RM_INFO"] || ENV["TEAMCITY_VERSION"]
10
+ MiniTest::Unit.runner.reporters << MiniTest::Reporters::RubyMineReporter.new
11
+ elsif ENV['TM_PID']
12
+ MiniTest::Unit.runner.reporters << MiniTest::Reporters::RubyMateReporter.new
13
+ else
14
+ MiniTest::Unit.runner.reporters << MiniTest::Reporters::ProgressReporter.new
15
+ end
16
+
17
+ def with_tmp_dir(&block)
18
+ cwd = Dir.pwd
19
+ Dir.mktmpdir do |dir|
20
+ Dir.chdir(dir)
21
+ yield(Pathname.new dir)
22
+ Dir.chdir(cwd) # jruby needs us to cd out of the tmpdir so it can remove it
23
+ end
24
+ ensure
25
+ Dir.chdir(cwd)
26
+ end
27
+
28
+ def with_tree(sufficies, options = {}, &block)
29
+ with_tmp_dir do |dir|
30
+ sufficies.each { |suffix| mk_tree dir, options.merge(:suffix => suffix) }
31
+ yield(dir)
32
+ end
33
+ end
34
+
35
+ def mk_tree(target_dir, options)
36
+ opts = {
37
+ :depth => 3,
38
+ :files_per_dir => 3,
39
+ :subdirs_per_dir => 3,
40
+ :prefix => "tmp",
41
+ :suffix => "",
42
+ :dir_prefix => "dir",
43
+ :dir_suffix => ""
44
+ }.merge options
45
+ p = target_dir.is_a?(Pathname) ? target_dir : Pathname.new(target_dir)
46
+ p.mkdir unless p.exist?
47
+
48
+ opts[:files_per_dir].times do |i|
49
+ fname = "#{opts[:prefix]}-#{i}#{opts[:suffix]}"
50
+ FileUtils.touch(p + fname).to_s
51
+ end
52
+ return if (opts[:depth] -= 1) <= 0
53
+ opts[:subdirs_per_dir].times do |i|
54
+ dir = "#{opts[:dir_prefix]}-#{i}#{opts[:dir_suffix]}"
55
+ mk_tree(p + dir, opts)
56
+ end
57
+ end
58
+
59
+ def expected_files(depth, files_per_dir, subdirs_per_dir)
60
+ return 0 if depth == 0
61
+ files_per_dir + (subdirs_per_dir * expected_files(depth - 1, files_per_dir, subdirs_per_dir))
62
+ end
63
+
64
+ def relative_path(parent, pathname)
65
+ pathname.relative_path_from(parent).to_s
66
+ end
67
+
68
+ def collect_files(iter)
69
+ files = []
70
+ while nxt = iter.next_file
71
+ files << relative_path(iter.path, nxt)
72
+ end
73
+ files
74
+ end
75
+
76
+ def fs_case_sensitive?
77
+ @fs_case_sensitive ||= begin
78
+ `touch CASETEST`
79
+ !File.exist?('casetest')
80
+ ensure
81
+ `rm CASETEST`
82
+ end
83
+ end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: findler
3
3
  version: !ruby/object:Gem::Version
4
- hash: 25
4
+ hash: 23
5
5
  prerelease:
6
6
  segments:
7
7
  - 0
8
8
  - 0
9
- - 3
10
- version: 0.0.3
9
+ - 4
10
+ version: 0.0.4
11
11
  platform: ruby
12
12
  authors:
13
13
  - Matthew McEachen
@@ -15,10 +15,10 @@ autorequire:
15
15
  bindir: bin
16
16
  cert_chain: []
17
17
 
18
- date: 2012-01-23 00:00:00 Z
18
+ date: 2012-03-28 00:00:00 -07:00
19
+ default_executable:
19
20
  dependencies:
20
21
  - !ruby/object:Gem::Dependency
21
- prerelease: false
22
22
  type: :development
23
23
  requirement: &id001 !ruby/object:Gem::Requirement
24
24
  none: false
@@ -29,10 +29,10 @@ dependencies:
29
29
  segments:
30
30
  - 0
31
31
  version: "0"
32
+ prerelease: false
32
33
  name: rake
33
34
  version_requirements: *id001
34
35
  - !ruby/object:Gem::Dependency
35
- prerelease: false
36
36
  type: :development
37
37
  requirement: &id002 !ruby/object:Gem::Requirement
38
38
  none: false
@@ -43,28 +43,40 @@ dependencies:
43
43
  segments:
44
44
  - 0
45
45
  version: "0"
46
+ prerelease: false
46
47
  name: yard
47
48
  version_requirements: *id002
48
49
  - !ruby/object:Gem::Dependency
49
- prerelease: false
50
50
  type: :development
51
51
  requirement: &id003 !ruby/object:Gem::Requirement
52
52
  none: false
53
53
  requirements:
54
- - - ~>
54
+ - - ">="
55
55
  - !ruby/object:Gem::Version
56
- hash: 19
56
+ hash: 3
57
57
  segments:
58
- - 2
59
- - 7
60
58
  - 0
61
- version: 2.7.0
62
- name: rspec
59
+ version: "0"
60
+ prerelease: false
61
+ name: minitest
63
62
  version_requirements: *id003
64
63
  - !ruby/object:Gem::Dependency
64
+ type: :development
65
+ requirement: &id004 !ruby/object:Gem::Requirement
66
+ none: false
67
+ requirements:
68
+ - - ">="
69
+ - !ruby/object:Gem::Version
70
+ hash: 3
71
+ segments:
72
+ - 0
73
+ version: "0"
65
74
  prerelease: false
75
+ name: minitest-reporters
76
+ version_requirements: *id004
77
+ - !ruby/object:Gem::Dependency
66
78
  type: :runtime
67
- requirement: &id004 !ruby/object:Gem::Requirement
79
+ requirement: &id005 !ruby/object:Gem::Requirement
68
80
  none: false
69
81
  requirements:
70
82
  - - ">="
@@ -73,8 +85,9 @@ dependencies:
73
85
  segments:
74
86
  - 0
75
87
  version: "0"
88
+ prerelease: false
76
89
  name: bloomer
77
- version_requirements: *id004
90
+ version_requirements: *id005
78
91
  description: |-
79
92
  Findler is designed for very large filesystem hierarchies,
80
93
  where simple block processing, or returning an array of matches, just isn't feasible.
@@ -89,16 +102,22 @@ extra_rdoc_files: []
89
102
 
90
103
  files:
91
104
  - .gitignore
105
+ - .travis.yml
92
106
  - Gemfile
93
107
  - MIT-LICENSE
94
108
  - README.md
95
109
  - Rakefile
110
+ - examples/find_exif_files
96
111
  - findler.gemspec
97
112
  - lib/findler.rb
113
+ - lib/findler/error.rb
114
+ - lib/findler/filters.rb
98
115
  - lib/findler/iterator.rb
99
116
  - lib/findler/version.rb
100
- - spec/findler_spec.rb
101
- - spec/spec_helper.rb
117
+ - test/filters_test.rb
118
+ - test/findler_test.rb
119
+ - test/test_helper.rb
120
+ has_rdoc: true
102
121
  homepage: https://github.com/mceachen/findler/
103
122
  licenses: []
104
123
 
@@ -128,11 +147,11 @@ required_rubygems_version: !ruby/object:Gem::Requirement
128
147
  requirements: []
129
148
 
130
149
  rubyforge_project: findler
131
- rubygems_version: 1.8.15
150
+ rubygems_version: 1.6.2
132
151
  signing_key:
133
152
  specification_version: 3
134
153
  summary: Findler is a stateful filesystem iterator
135
154
  test_files:
136
- - spec/findler_spec.rb
137
- - spec/spec_helper.rb
138
- has_rdoc:
155
+ - test/filters_test.rb
156
+ - test/findler_test.rb
157
+ - test/test_helper.rb
data/spec/findler_spec.rb DELETED
@@ -1,117 +0,0 @@
1
- require 'spec_helper'
2
-
3
- describe Findler do
4
-
5
- before :each do
6
- @opts = {
7
- :depth => 3,
8
- :files_per_dir => 3,
9
- :subdirs_per_dir => 3,
10
- :prefix => "tmp",
11
- :suffix => "",
12
- :dir_prefix => "dir",
13
- :dir_suffix => ""
14
- }
15
- end
16
-
17
- it "should find all files by default" do
18
- with_tree([".jpg", ".txt"]) do |dir|
19
- f = Findler.new(dir)
20
- iter = f.iterator
21
- collect_files(iter).should =~ `find * -type f`.split
22
- end
23
- end
24
-
25
- it "should find only .jpg files when constrained" do
26
- with_tree([".jpg", ".txt", ".JPG"]) do |dir|
27
- f = Findler.new(dir)
28
- f.append_extension ".jpg"
29
- iter = f.iterator
30
- collect_files(iter).should =~ `find * -type f -name \\*.jpg`.split
31
- end
32
- end
33
-
34
- it "should find .jpg or .JPG files when constrained" do
35
- with_tree([".jpg", ".txt", ".JPG"]) do |dir|
36
- f = Findler.new(dir)
37
- f.append_extension ".jpg"
38
- f.case_insensitive!
39
- iter = f.iterator
40
- collect_files(iter).should =~ `find * -type f -iname \\*.jpg`.split
41
- end
42
- end
43
-
44
- it "should find files added after iteration started" do
45
- with_tree([".txt"]) do |dir|
46
- f = Findler.new(dir)
47
- iter = f.iterator
48
- iter.next.should_not be_nil
49
-
50
- # cheating with mtime on the touch doesn't properly update the parent directory ctime,
51
- # so we have to deal with the second-granularity resolution of the filesystem.
52
- sleep(1.1)
53
-
54
- FileUtils.touch(dir + "new.txt")
55
- collect_files(iter).should include("new.txt")
56
- end
57
- end
58
-
59
- it "should find new files after a rescan" do
60
- with_tree([".txt", ".no"]) do |dir|
61
- f = Findler.new(dir)
62
- f.append_extension ".txt"
63
- iter = f.iterator
64
- collect_files(iter).should =~ `find * -type f -iname \\*.txt`.split
65
- FileUtils.touch(dir + "dir-0" + "dir-1" + "new-0.txt")
66
- FileUtils.touch(dir + "dir-1" + "dir-0" + "new-1.txt")
67
- FileUtils.touch(dir + "dir-2" + "dir-2" + "new-2.txt")
68
- collect_files(iter).should be_empty
69
- iter.rescan!
70
- collect_files(iter).should =~ ["dir-0/dir-1/new-0.txt", "dir-1/dir-0/new-1.txt", "dir-2/dir-2/new-2.txt"]
71
- end
72
- end
73
-
74
- it "should not return files removed after iteration started" do
75
- with_tree([".txt"]) do |dir|
76
- f = Findler.new(dir)
77
- iter = f.iterator
78
- iter.next.should_not be_nil
79
- sleep(1.1) # see above for hand-wringing-defense of this atrocity
80
-
81
- (dir + "tmp-1.txt").unlink
82
- collect_files(iter).should_not include("tmp-1.txt")
83
- end
84
- end
85
-
86
- it "should dump/load in the middle of iterating" do
87
- with_tree([".jpg", ".txt", ".JPG"]) do |dir|
88
- all_files = `find * -type f -iname \\*.jpg`.split
89
- all_files.size.times do |i|
90
- f = Findler.new(dir)
91
- f.append_extension ".jpg"
92
- f.case_insensitive!
93
- iter_a = f.iterator
94
- files_a = i.times.collect { relative_path(iter_a, iter_a.next) }
95
- iter_b = Marshal.load(Marshal.dump(iter_a))
96
- files_b = collect_files(iter_b)
97
-
98
- iter_c = Marshal.load(Marshal.dump(iter_b))
99
- collect_files(iter_c)
100
- iter_c.next.should be_nil
101
-
102
- (files_a + files_b).should =~ all_files
103
- end
104
- end
105
- end
106
-
107
- it "should create an iterator even for a non-existent directory" do
108
- tmpdir = nil
109
- cwd = Dir.pwd
110
- Dir.mktmpdir do |dir|
111
- tmpdir = Pathname.new dir
112
- end
113
- tmpdir.should_not exist
114
- f = Findler.new(tmpdir)
115
- collect_files(f.iterator).should be_empty
116
- end
117
- end
data/spec/spec_helper.rb DELETED
@@ -1,66 +0,0 @@
1
- require 'rspec'
2
- require 'tmpdir'
3
- require 'fileutils'
4
- require 'findler'
5
-
6
- RSpec.configure do |config|
7
- config.color_enabled = true
8
- config.formatter = 'documentation'
9
- end
10
-
11
- def with_tmp_dir(&block)
12
- cwd = Dir.pwd
13
- Dir.mktmpdir do |dir|
14
- Dir.chdir(dir)
15
- yield(Pathname.new dir)
16
- end
17
- Dir.chdir(cwd)
18
- end
19
-
20
- def with_tree(sufficies, &block)
21
- with_tmp_dir do |dir|
22
- sufficies.each { |suffix| mk_tree dir, @opts.merge(:suffix => suffix) }
23
- yield(dir)
24
- end
25
- end
26
-
27
- def mk_tree(target_dir, options)
28
- opts = {
29
- :depth => 3,
30
- :files_per_dir => 3,
31
- :subdirs_per_dir => 3,
32
- :prefix => "tmp",
33
- :suffix => "",
34
- :dir_prefix => "dir",
35
- :dir_suffix => ""
36
- }.merge options
37
- p = target_dir.is_a?(Pathname) ? target_dir : Pathname.new(target_dir)
38
- p.mkdir unless p.exist?
39
-
40
- opts[:files_per_dir].times do |i|
41
- fname = "#{opts[:prefix]}-#{i}#{opts[:suffix]}"
42
- FileUtils.touch(p + fname).to_s
43
- end
44
- return if (opts[:depth] -= 1) <= 0
45
- opts[:subdirs_per_dir].times do |i|
46
- dir = "#{opts[:dir_prefix]}-#{i}#{opts[:dir_suffix]}"
47
- mk_tree(p + dir, opts)
48
- end
49
- end
50
-
51
- def expected_files(depth, files_per_dir, subdirs_per_dir)
52
- return 0 if depth == 0
53
- files_per_dir + (subdirs_per_dir * expected_files(depth - 1, files_per_dir, subdirs_per_dir))
54
- end
55
-
56
- def relative_path(parent, pathname)
57
- pathname.relative_path_from(parent.path).to_s
58
- end
59
-
60
- def collect_files(iter)
61
- files = []
62
- while nxt = iter.next
63
- files << relative_path(iter, nxt)
64
- end
65
- files
66
- end