findler 0.0.3 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
data/.travis.yml ADDED
@@ -0,0 +1,6 @@
1
+ language: ruby
2
+ script: bundle exec rake
3
+ rvm:
4
+ - 1.8.7
5
+ - 1.9.3
6
+ - rbx-18mode
data/Gemfile CHANGED
@@ -1,6 +1,3 @@
1
1
  source "http://rubygems.org"
2
2
  gemspec
3
3
 
4
- gem "rake"
5
- gem "yard"
6
- gem "rspec", '~> 2.7.0'
data/README.md CHANGED
@@ -1,5 +1,7 @@
1
1
  # Findler: Filesystem Iteration with Persistable State
2
2
 
3
+ [![Build Status](https://secure.travis-ci.org/mceachen/findler.png?branch=master)](http://travis-ci.org/mceachen/findler)
4
+
3
5
  Findler is a Ruby library for iterating over a filtered set of files from a given
4
6
  path, written to be suitable with concurrent workers and very large
5
7
  filesystem hierarchies.
@@ -8,9 +10,9 @@ filesystem hierarchies.
8
10
 
9
11
  ```ruby
10
12
  f = Findler.new "/Users/mrm"
11
- f.append_extension ".jpg", ".jpeg"
13
+ f.add_extensions ".jpg", ".jpeg"
12
14
  iterator = f.iterator
13
- iterator.next
15
+ iterator.next_file
14
16
  # => "/Users/mrm/Photos/img_1000.jpg"
15
17
  ```
16
18
 
@@ -31,7 +33,7 @@ To resume iteration:
31
33
 
32
34
  ```ruby
33
35
  Marshal.load(IO.open('iterator.state'))
34
- iterator.next
36
+ iterator.next_file
35
37
  # => "/Users/mrm/Photos/img_1001.jpg"
36
38
  ```
37
39
 
@@ -39,13 +41,89 @@ To re-check a directory hierarchy for files that you haven't visited yet:
39
41
 
40
42
  ```ruby
41
43
  iterator.rescan!
42
- iterator.next
44
+ iterator.next_file
43
45
  # => "/Users/mrm/Photos/img_1002.jpg"
44
46
  ```
45
47
 
48
+ External synchronization between the serialized state of the
49
+ iterator and the other processes will have to be done by you, of course.
50
+ The ```load```, ```next_file``` , and ```dump``` should be done while holding
51
+ an iteration mutex of some sort.
52
+
53
+ ## Filtering and ordering
54
+
55
+ Filters provide custom exclusion and ordering criteria, so you don't
56
+ have to do that logic in the code that consumes from your iterator.
57
+
58
+ Filters can't be procs or lambdas because those aren't safely serializable.
59
+
60
+ What you provide to ```add_filter``` is a symbolized name of a class method
61
+ on ```Findler::Filters```:
62
+
63
+ ```ruby
64
+ f = Findler.new(".")
65
+ f.add_filter :order_by_name
66
+ ```
67
+
68
+ Note that the last filter added will be last to order the children, so it will be the
69
+ "primary" sort criterion. Note also that the ordering is only done in
70
+ the context of a given directory.
71
+
72
+ ### Implementing your own filter
73
+
74
+ Filter methods receive an array of ```Pathname``` instances. Those pathnames will:
75
+
76
+ 1. have the same parent
77
+ 2. will not have been enumerated by ```next_file()``` already
78
+ 3. will satisfy the settings given to the parent Findler instance, like ```include_hidden```
79
+ and added patterns.
80
+
81
+ Note that the last filter added will be last to order the children, so it will be the
82
+ "primary" sort criterion.
83
+
84
+ The returned values from the class method will be the final set of elements (both files
85
+ and directories) that Findler will return from ```next_file()```.
86
+
87
+ ### Example
88
+
89
+ To find files that have valid EXIF headers, using the *most* excellent
90
+ [exiftoolr](https://github.com/mceachen/exiftoolr) gem, you'd do this:
91
+
92
+ ```ruby
93
+ require 'findler'
94
+ require 'exiftoolr'
95
+
96
+ # Monkey-patch Filters to add our custom filter:
97
+ class Findler::Filters
98
+ def self.exif_only(children)
99
+ child_files = children.select{|ea|ea.file?}
100
+ child_dirs = children.select{|ea|ea.directory?}
101
+ e = Exiftoolr.new(child_files)
102
+ e.files_with_results + child_dirs
103
+ end
104
+ end
105
+
106
+ f = Findler.new "/Users/mrm"
107
+ f.add_extensions ".jpg", ".jpeg", ".cr2", ".nef"
108
+ f.case_insensitive!
109
+ f.add_filter(:exif_only)
110
+ ```
111
+
112
+ ### Filter implementation notes
113
+
114
+ * The array of ```Pathname``` instances can be assumed to be absolute.
115
+ * Only child files that satisfy the ```extension``` and ```pattern``` filters will be seen by the filter class method.
116
+ * If a directory doesn't have any relevant files, the filter method will be called multiple times for a given call to ```next_file()```.
117
+ * if you want to be notified when new directories are walked into, and you want to do a bulk operation within that directory,
118
+ this gives you that hook–-just remember to return the children array at the end of your block.
119
+
120
+ ### Why can't ```filter_with``` be a proc?
121
+
122
+ Because procs and lambdas aren't Marshal-able, and I didn't want to use something scary like ruby2ruby and eval.
46
123
 
47
124
  ## Changelog
48
125
 
49
- * 0.0.1 First `find`
126
+ * 0.0.4 Added custom filters for ```next_file()``` and singular aliases for ```add_extension``` and ```add_pattern```
127
+ * 0.0.3 Fixed gemfile packaging
50
128
  * 0.0.2 Added scalable Bloom filter so ```Iterator#rescan``` is possible
51
- * 0.0.3 Fixed gemfile packaging
129
+ * 0.0.1 First `find`
data/Rakefile CHANGED
@@ -4,7 +4,13 @@ YARD::Rake::YardocTask.new do |t|
4
4
  t.files = ['lib/**/*.rb', 'README.md']
5
5
  end
6
6
 
7
- require "rspec/core/rake_task"
8
- RSpec::Core::RakeTask.new(:spec)
7
+ require 'rake/testtask'
9
8
 
10
- task :default => :spec
9
+ Rake::TestTask.new do |t|
10
+ t.libs.push "lib"
11
+ t.libs.push "test"
12
+ t.pattern = 'test/**/*_test.rb'
13
+ t.verbose = true
14
+ end
15
+
16
+ task :default => :test
@@ -0,0 +1,27 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'rubygems'
4
+ require 'exiftoolr'
5
+ require 'findler'
6
+
7
+ class Findler::Filters
8
+ def self.with_exif(children)
9
+ child_files = children.select { |ea| ea.file? }
10
+ child_dirs = children.select { |ea| ea.directory? }
11
+ e = Exiftoolr.new(child_files)
12
+ good = e.files_with_results
13
+ bad = child_files - good
14
+ puts "Files missing EXIF:\n #{(bad).join("\n ")}" unless bad.empty?
15
+ good + child_dirs
16
+ end
17
+ end
18
+
19
+ f = Findler.new(ENV['HOME'])
20
+ f.add_extensions ".jpg", ".jpeg", ".cr2", ".nef"
21
+ f.add_filter :with_exif
22
+ f.case_insensitive!
23
+ iter = f.iterator
24
+
25
+ while nxt = iter.next_file
26
+ puts "next file: #{nxt}"
27
+ end
data/findler.gemspec CHANGED
@@ -16,11 +16,12 @@ Gem::Specification.new do |s|
16
16
  s.rubyforge_project = "findler"
17
17
 
18
18
  s.files = `git ls-files`.split("\n")
19
- s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.test_files = `git ls-files -- {test,test,features}/*`.split("\n")
20
20
  s.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
21
21
  s.require_paths = ["lib"]
22
22
  s.add_development_dependency "rake"
23
23
  s.add_development_dependency "yard"
24
- s.add_development_dependency "rspec", "~> 2.7.0"
24
+ s.add_development_dependency "minitest"
25
+ s.add_development_dependency "minitest-reporters"
25
26
  s.add_dependency "bloomer"
26
27
  end
data/lib/findler.rb CHANGED
@@ -1,25 +1,46 @@
1
1
  class Findler
2
- IGNORE_CASE = 1
3
- INCLUDE_HIDDEN = 2
4
2
 
5
3
  autoload :Iterator, "findler/iterator"
4
+ autoload :Error, "findler/error"
5
+ require "findler/filters"
6
6
 
7
- def initialize path
7
+ IGNORE_CASE = 1
8
+ INCLUDE_HIDDEN = 2
9
+
10
+ def initialize(path)
8
11
  @path = path
9
12
  @flags = 0
10
13
  end
11
14
 
12
- # These are File.fnmatch patterns. If any pattern matches, it will be returned by Iterator#next.
15
+ # These are File.fnmatch patterns.
16
+ # If any pattern matches, it will be returned by Iterator#next.
13
17
  # (see File.fnmatch?)
14
- def add_pattern *patterns
15
- patterns.each { |ea| (@patterns ||= []) << ea }
18
+ def patterns
19
+ @patterns ||= []
20
+ end
21
+
22
+ def add_patterns(*patterns)
23
+ self.patterns += patterns
16
24
  end
17
25
 
18
- def append_extension *extensions
19
- extensions.each { |ea| add_pattern "*#{normalize_extension(ea)}" }
26
+ def add_pattern(pattern)
27
+ self.patterns << pattern
28
+ end
29
+
30
+ def add_extension(extension)
31
+ add_pattern "*#{normalize_extension(extension)}"
32
+ end
33
+
34
+ def add_extensions(*extensions)
35
+ extensions.each { |ea| add_extension(ea) }
36
+ end
37
+
38
+ # Should patterns be interpreted in a case-sensitive manner? The default is case sensitive,
39
+ # but if your local filesystem is not case sensitive, this flag is a no-op.
40
+ def case_sensitive!
41
+ @flags &= ~IGNORE_CASE
20
42
  end
21
43
 
22
- # Should patterns be interpreted in a case-insensitive manor? (default is case sensitive)
23
44
  def case_insensitive!
24
45
  @flags |= IGNORE_CASE
25
46
  end
@@ -30,18 +51,60 @@ class Findler
30
51
  @flags |= INCLUDE_HIDDEN
31
52
  end
32
53
 
54
+ def exclude_hidden!
55
+ @flags &= ~INCLUDE_HIDDEN
56
+ end
57
+
58
+ def filter_class
59
+ (@filter_class ||= Filters)
60
+ end
61
+
62
+ def filter_class=(new_filter_class)
63
+ raise Error unless new_filter_class.is_a? Class
64
+ filters.each{|ea|new_filter_class.method(ea)} # verify the filters class has those methods defined
65
+ @filter_class = new_filter_class
66
+ end
67
+
68
+ # Accepts symbols whose names are class methods on Finder::Filters.
69
+ #
70
+ # Filter methods receive an array of Pathname instances, and are in charge of ordering
71
+ # and filtering the array. The returned array of pathnames will be used by the iterator.
72
+ #
73
+ # Those pathnames will:
74
+ # a) have the same parent
75
+ # b) will not have been enumerated by next() already
76
+ # c) will satisfy the hidden flag and patterns preferences
77
+ #
78
+ # Note that the last filter added will be last to order the children, so it will be the
79
+ # "primary" sort criterion.
80
+ def add_filter(filter_symbol)
81
+ filter_class.method(filter_symbol)
82
+ filters << filter_symbol
83
+ end
84
+
85
+ def filters
86
+ (@filters ||= [])
87
+ end
88
+
89
+ def add_filters(*filter_symbols)
90
+ filter_symbols.each { |ea| add_filter(ea) }
91
+ end
92
+
33
93
  def iterator
34
- Iterator.new(:path => @path, :patterns => @patterns, :flags => @flags)
94
+ Iterator.new(:path => @path,
95
+ :patterns => @patterns,
96
+ :flags => @flags,
97
+ :filters => @filters)
35
98
  end
36
99
 
37
100
  private
38
101
 
39
- def normalize_extension extension
102
+ def normalize_extension(extension)
40
103
  if extension.nil? || extension.empty? || extension.start_with?(".")
41
104
  extension
42
105
  else
43
106
  ".#{extension}"
44
107
  end
45
108
  end
46
-
47
109
  end
110
+
@@ -0,0 +1,2 @@
1
+ class Findler::Error < StandardError
2
+ end
@@ -0,0 +1,38 @@
1
+ class Findler::Filters
2
+ # files first, then directories
3
+ def self.files_first(paths)
4
+ preserve_sort_by(paths) { |ea| ea.file? ? -1 : 1 }
5
+ end
6
+
7
+ # directories first, then files
8
+ def self.directories_first(paths)
9
+ preserve_sort_by(paths) { |ea| ea.directory? ? -1 : 1 }
10
+ end
11
+
12
+ # order by the mtime of each file. Oldest files first.
13
+ def self.order_by_mtime_asc(paths)
14
+ preserve_sort_by(paths) { |ea| ea.mtime }
15
+ end
16
+
17
+ # reverse order by the mtime of each file. Newest files first.
18
+ def self.order_by_mtime_desc(paths)
19
+ preserve_sort_by(paths) { |ea| -ea.mtime }
20
+ end
21
+
22
+ # order by the name of each file.
23
+ def self.order_by_name(paths)
24
+ preserve_sort_by(paths) { |ea| ea.basename.to_s }
25
+ end
26
+
27
+ # reverse the order of the sort
28
+ def self.reverse(paths)
29
+ paths.reverse
30
+ end
31
+
32
+ def self.preserve_sort_by(array, &block)
33
+ ea_to_index = Hash[array.zip((0..array.size-1).to_a)]
34
+ array.sort_by do |ea|
35
+ [yield(ea), ea_to_index[ea]]
36
+ end
37
+ end
38
+ end
@@ -1,56 +1,50 @@
1
1
  require 'bloomer'
2
+ require 'pathname'
2
3
 
3
4
  class Findler
5
+
4
6
  class Iterator
5
7
 
6
- attr_reader :path, :parent, :patterns, :flags, :visited_dirs, :visited_files
8
+ attr_reader :path, :parent, :patterns, :flags, :filters_class, :filters, :visited_dirs, :visited_files
7
9
 
8
10
  def initialize(attrs, parent = nil)
9
11
  @path = attrs[:path]
10
12
  @path = Pathname.new(@path) unless @path.is_a? Pathname
13
+ @path = @path.expand_path unless @path.absolute?
11
14
  @parent = parent
12
15
 
13
- set_ivar(:visited_dirs, attrs) { Bloomer::Scalable.new(256, 1.0/1_000_000) }
14
- set_ivar(:visited_files, attrs) { Bloomer::Scalable.new(256, 1.0/1_000_000) }
15
- set_ivar(:patterns, attrs) { nil }
16
- set_ivar(:flags, attrs) { 0 }
16
+ set_inheritable_ivar(:visited_dirs, attrs) { self.class.new_presence_collection }
17
+ set_inheritable_ivar(:visited_files, attrs) { self.class.new_presence_collection }
18
+ set_inheritable_ivar(:patterns, attrs) { nil }
19
+ set_inheritable_ivar(:flags, attrs) { 0 }
20
+ set_inheritable_ivar(:filters, attrs) { [] }
21
+ set_inheritable_ivar(:filters_class, attrs) { Filters }
22
+ set_inheritable_ivar(:sort_with, attrs) { nil }
17
23
 
18
24
  @sub_iter = self.class.new(attrs[:sub_iter], self) if attrs[:sub_iter]
19
25
  end
20
26
 
21
27
  # Visit this directory and all sub directories, and check for unseen files. Only call on the root iterator.
22
28
  def rescan!
23
- raise "Only invoke on root" unless @parent.nil?
24
- @visited_dirs = Bloomer::Scalable.new(256, 1.0/1_000_000)
29
+ raise Error, "Only invoke on root" unless @parent.nil?
30
+ @visited_dirs = self.class.new_presence_collection
25
31
  @children = nil
26
32
  @sub_iter = nil
27
33
  end
28
34
 
29
- #def to_hash
30
- # {:path => @path, :visited_dirs:patterns => @patterns, :flags => @flags, :sub_iter => @sub_iter && @sub_iter.to_hash}
31
- #end
32
- #
33
- #def _dump(depth)
34
- # Marshal.dump(to_hash)
35
- #end
36
- #
37
- #def self._load(data)
38
- # new(Marshal.load(data))
39
- #end
40
-
41
- def case_insensitive?
42
- (Findler::IGNORE_CASE | flags) != 0
35
+ def ignore_case?
36
+ (Findler::IGNORE_CASE & flags) > 0
43
37
  end
44
38
 
45
- def skip_hidden?
46
- (Findler::INCLUDE_HIDDEN | flags) == 0
39
+ def include_hidden?
40
+ (Findler::INCLUDE_HIDDEN & flags) > 0
47
41
  end
48
42
 
49
43
  def fnmatch_flags
50
- @_fnflags ||= (@parent && @parent.fnmatch_flags) || begin
44
+ @fnmatch_flags ||= (@parent && @parent.fnmatch_flags) || begin
51
45
  f = 0
52
- f |= File::FNM_CASEFOLD if case_insensitive?
53
- f |= File::FNM_DOTMATCH if !skip_hidden?
46
+ f |= File::FNM_CASEFOLD if ignore_case?
47
+ f |= File::FNM_DOTMATCH if include_hidden?
54
48
  f
55
49
  end
56
50
  end
@@ -59,11 +53,11 @@ class Findler
59
53
  @path
60
54
  end
61
55
 
62
- def next
56
+ def next_file
63
57
  return nil unless @path.exist?
64
-
58
+
65
59
  if @sub_iter
66
- nxt = @sub_iter.next
60
+ nxt = @sub_iter.next_file
67
61
  return nxt unless nxt.nil?
68
62
  @visited_dirs.add @sub_iter.path.to_s
69
63
  @sub_iter = nil
@@ -71,10 +65,13 @@ class Findler
71
65
 
72
66
  # If someone touches the directory while we iterate, redo the @children.
73
67
  @children = nil if @path.ctime != @ctime || @path.mtime != @mtime
74
- @children ||= begin
68
+
69
+ if @children.nil?
75
70
  @mtime = @path.mtime
76
71
  @ctime = @path.ctime
77
- @path.children.delete_if { |ea| skip?(ea) }
72
+ children = @path.children.delete_if { |ea| skip?(ea) }
73
+ filtered_children = @filters.inject(children){ |c, f| filter(c, f) }
74
+ @children = filtered_children
78
75
  end
79
76
 
80
77
  nxt = @children.shift
@@ -82,7 +79,7 @@ class Findler
82
79
 
83
80
  if nxt.directory?
84
81
  @sub_iter = Iterator.new({:path => nxt}, self)
85
- self.next
82
+ self.next_file
86
83
  else
87
84
  @visited_files.add nxt.to_s
88
85
  nxt
@@ -91,12 +88,30 @@ class Findler
91
88
 
92
89
  private
93
90
 
94
- def set_ivar(field, attrs, &block)
95
- sym = "@#{field}".to_sym
96
- v = attrs[field]
97
- v ||= begin
98
- (p = instance_variable_get(:@parent)) && p.instance_variable_get(sym)
91
+ def self.new_presence_collection
92
+ Bloomer::Scalable.new(256, 1.0/1_000_000)
93
+ end
94
+
95
+ def filter(children, filter_symbol)
96
+ filtered_children = filters_class.send(filter_symbol, children)
97
+ unless filtered_children.respond_to? :collect
98
+ raise Error, "#{path.to_s}: filter_with, must return an Enumerable"
99
+ end
100
+ children_as_pathnames = filtered_children.collect { |ea| ea.is_a?(Pathname) ? ea : Pathname.new(ea) }
101
+ illegal_children = children_as_pathnames - children
102
+ unless illegal_children.empty?
103
+ raise Error, "#{path.to_s}: filter_with returned unexpected paths: #{illegal_children.join(",")}"
99
104
  end
105
+ children_as_pathnames
106
+ end
107
+
108
+ # Sets the instance variable to the value in attrs[field].
109
+ # If attrs is missing a value, pull the value from the parent.
110
+ # If the parent doesn't have a value, use the block to generate a default.
111
+ def set_inheritable_ivar(field, attrs, &block)
112
+ v = attrs[field]
113
+ sym = "@#{field}".to_sym
114
+ v ||= parent.instance_variable_get(sym)
100
115
  v ||= yield
101
116
  instance_variable_set(sym, v)
102
117
  end
@@ -107,13 +122,13 @@ class Findler
107
122
 
108
123
  def skip? pathname
109
124
  s = pathname.to_s
110
- return true if hidden?(pathname) && skip_hidden?
111
- return @visited_dirs.include?(s) if pathname.directory?
112
- return true if @visited_files.include?(s)
125
+ return true if !include_hidden? && hidden?(pathname)
126
+ return visited_dirs.include?(s) if pathname.directory?
127
+ return true if visited_files.include?(s)
113
128
  unless patterns.nil?
114
129
  return true if patterns.none? { |p| pathname.fnmatch(p, fnmatch_flags) }
115
130
  end
116
131
  return false
117
132
  end
118
133
  end
119
- end
134
+ end
@@ -1,3 +1,3 @@
1
1
  class Findler
2
- VERSION = "0.0.3"
2
+ VERSION = "0.0.4"
3
3
  end
@@ -0,0 +1,15 @@
1
+ require 'test_helper'
2
+
3
+ describe Findler::Filters do
4
+ it "should preserve sort order with no result" do
5
+ a = %w(h a p p y)
6
+ b = Findler::Filters.preserve_sort_by(a) { 0 }
7
+ b.must_equal a
8
+ end
9
+
10
+ it "should preserve sort order" do
11
+ a = [3.5, 3.1, 1.1, 2.0, 1.5, 1.3]
12
+ b = Findler::Filters.preserve_sort_by(a) { |ea| ea.to_i }
13
+ b.must_equal [1.1, 1.5, 1.3, 2.0, 3.5, 3.1]
14
+ end
15
+ end
@@ -0,0 +1,224 @@
1
+ require 'test_helper'
2
+
3
+ class Findler::Filters
4
+ def self.non_empty_files(children)
5
+ (children.select { |ea| ea.directory? || ea.size > 0 })
6
+ end
7
+
8
+ def self.no_return(children)
9
+ end
10
+
11
+ def self.invalid_return(children)
12
+ "/invalid/file"
13
+ end
14
+
15
+ def self.files_to_s(children)
16
+ (children).collect { |ea| ea.to_s }
17
+ end
18
+ end
19
+
20
+ describe Findler do
21
+
22
+ def touch_secrets
23
+ `mkdir .hide ; touch .outer-hide dir-0/.hide .hide/normal.txt .hide/.secret`
24
+ end
25
+
26
+ it "should detect hidden files properly" do
27
+ i = Findler::Iterator.new(:path => "/tmp")
28
+ i.send(:hidden?, Pathname.new("/a/b")).must_equal false
29
+ i.send(:hidden?, Pathname.new("/a/.b")).must_equal true
30
+ i.send(:hidden?, Pathname.new("/.a/.b")).must_equal true
31
+ i.send(:hidden?, Pathname.new("/.a/b")).must_equal false
32
+ end
33
+
34
+ it "should skip hidden files by default" do
35
+ i = Findler.new("/tmp").iterator
36
+ i.send(:skip?, Pathname.new("/tmp/not-hidden")).must_equal false
37
+ i.send(:skip?, Pathname.new("/tmp/.hidden")).must_equal true
38
+ end
39
+
40
+ it "should not skip hidden files when set" do
41
+ f = Findler.new("/tmp")
42
+ f.include_hidden!
43
+ i = f.iterator
44
+ i.send(:skip?, Pathname.new("/tmp/not-hidden")).must_equal false
45
+ i.send(:skip?, Pathname.new("/tmp/.hidden")).must_equal false
46
+ end
47
+
48
+ it "should find all non-hidden files by default" do
49
+ with_tree(%W(.jpg .txt)) do |dir|
50
+ touch_secrets
51
+ f = Findler.new(dir)
52
+ collect_files(f.iterator).sort.must_equal `find * -type f -not -name '.*'`.split.sort
53
+ f.exclude_hidden! # should be a no-op
54
+ collect_files(f.iterator).sort.must_equal `find * -type f -not -name '.*'`.split.sort
55
+ f.include_hidden!
56
+ collect_files(f.iterator).sort.must_equal `find . -type f | sed -e 's/^\\.\\///'`.split.sort
57
+ end
58
+ end
59
+
60
+ it "should find only .jpg files when constrained" do
61
+ with_tree(%W(.jpg .txt .JPG)) do |dir|
62
+ f = Findler.new(dir)
63
+ f.add_extension ".jpg"
64
+ if fs_case_sensitive?
65
+ f.case_sensitive!
66
+ collect_files(f.iterator).sort.must_equal `find * -type f -name \\*.jpg`.split.sort
67
+ end
68
+ f.case_insensitive!
69
+ collect_files(f.iterator).sort.must_equal `find * -type f -iname \\*.jpg`.split.sort
70
+ end
71
+ end
72
+
73
+ it "should find .jpg or .JPG files when constrained" do
74
+ with_tree(%W(.jpg .txt .JPG)) do |dir|
75
+ f = Findler.new(dir)
76
+ f.add_extension ".jpg"
77
+ f.case_insensitive!
78
+ iter = f.iterator
79
+ collect_files(iter).sort.must_equal `find * -type f -iname \\*.jpg`.split.sort
80
+ end
81
+ end
82
+
83
+ it "should find files added after iteration started" do
84
+ with_tree(%W(.txt)) do |dir|
85
+ f = Findler.new(dir)
86
+ iter = f.iterator
87
+ iter.next_file.wont_be_nil
88
+
89
+ # cheating with mtime on the touch doesn't properly update the parent directory ctime,
90
+ # so we have to deal with the second-granularity resolution of the filesystem.
91
+ sleep(1.1)
92
+
93
+ FileUtils.touch(dir + "new.txt")
94
+ collect_files(iter).must_include("new.txt")
95
+ end
96
+ end
97
+
98
+ it "should find new files after a rescan" do
99
+ with_tree([".txt", ".no"]) do |dir|
100
+ f = Findler.new(dir)
101
+ f.add_extension ".txt"
102
+ iter = f.iterator
103
+ collect_files(iter).sort.must_equal `find * -type f -iname \\*.txt`.split.sort
104
+ FileUtils.touch(dir + "dir-0" + "dir-1" + "new-0.txt")
105
+ FileUtils.touch(dir + "dir-1" + "dir-0" + "new-1.txt")
106
+ FileUtils.touch(dir + "dir-2" + "dir-2" + "new-2.txt")
107
+ collect_files(iter).must_be_empty
108
+ iter.rescan!
109
+ collect_files(iter).sort.must_equal ["dir-0/dir-1/new-0.txt", "dir-1/dir-0/new-1.txt", "dir-2/dir-2/new-2.txt"]
110
+ end
111
+ end
112
+
113
+ it "should not return files removed after iteration started" do
114
+ with_tree([".txt"]) do |dir|
115
+ f = Findler.new(dir)
116
+ iter = f.iterator
117
+ iter.next_file.wont_be_nil
118
+ sleep(1.1) # see above for hand-wringing-defense of this atrocity
119
+
120
+ (dir + "tmp-1.txt").unlink
121
+ collect_files(iter).wont_include("tmp-1.txt")
122
+ end
123
+ end
124
+
125
+ it "should dump/load in the middle of iterating" do
126
+ with_tree(%W(.jpg .txt .JPG)) do |dir|
127
+ all_files = `find * -type f -iname \\*.jpg`.split
128
+ all_files.size.times do |i|
129
+ f = Findler.new(dir)
130
+ f.add_extension ".jpg"
131
+ f.case_insensitive!
132
+ iter_a = f.iterator
133
+ files_a = i.times.collect { relative_path(iter_a.path, iter_a.next_file) }
134
+ iter_b = Marshal.load(Marshal.dump(iter_a))
135
+ files_b = collect_files(iter_b)
136
+
137
+ iter_c = Marshal.load(Marshal.dump(iter_b))
138
+ collect_files(iter_c)
139
+ iter_c.next_file.must_be_nil
140
+
141
+ (files_a + files_b).sort.must_equal all_files.sort
142
+ end
143
+ end
144
+ end
145
+
146
+ it "should create an iterator even for a non-existent directory" do
147
+ tmpdir = nil
148
+ Dir.mktmpdir do |dir|
149
+ tmpdir = Pathname.new dir
150
+ end
151
+ tmpdir.exist?.must_equal false
152
+ f = Findler.new(tmpdir)
153
+ collect_files(f.iterator).must_be_empty
154
+ end
155
+
156
+ it "should raise an error if the block given to next_file returns nil" do
157
+ Dir.mktmpdir do |dir|
158
+ f = Findler.new(dir)
159
+ f.add_filter :no_return
160
+ i = f.iterator
161
+ lambda { i.next_file }.must_raise(Findler::Error)
162
+ end
163
+ end
164
+
165
+ it "should raise an error if the block returns non-children" do
166
+ with_tree(%W(.txt)) do |dir|
167
+ f = Findler.new(dir)
168
+ f.add_filter :invalid_return
169
+ i = f.iterator
170
+ lambda { i.next_file }.must_raise(Findler::Error)
171
+ end
172
+ end
173
+
174
+ it "should support filter_with against global/Kernel methods" do
175
+ with_tree(%W(.txt)) do |dir|
176
+ f = Findler.new(dir)
177
+ f.add_filter :files_to_s
178
+ iter = f.iterator
179
+ files = collect_files(iter)
180
+ files.sort.must_equal `find * -type f`.split.sort
181
+ end
182
+ end
183
+
184
+ it "should support next_file blocks properly" do
185
+ with_tree(%W(.a .b)) do |dir|
186
+ Dir["**/*.a"].each { |ea| File.open(ea, 'w') { |f| f.write("hello") } }
187
+ f = Findler.new(dir)
188
+ f.add_filter :non_empty_files
189
+ iter = f.iterator
190
+ files = collect_files(iter)
191
+ files.sort.must_equal `find * -type f -name \\*.a`.split.sort
192
+ end
193
+ end
194
+
195
+ it "should support files_first ordering" do
196
+ with_tree(%W(.a), {
197
+ :depth => 2,
198
+ :files_per_dir => 2,
199
+ :subdirs_per_dir => 1,
200
+ }) do |dir|
201
+ f = Findler.new(dir)
202
+ f.add_filters :order_by_name, :files_first
203
+ expected = %W(tmp-0.a tmp-1.a dir-0/tmp-0.a dir-0/tmp-1.a)
204
+ collect_files(f.iterator).must_equal expected
205
+ f.add_filter :reverse
206
+ collect_files(f.iterator).must_equal expected.reverse
207
+ end
208
+ end
209
+
210
+ it "should support directory_first ordering" do
211
+ with_tree(%W(.a), {
212
+ :depth => 2,
213
+ :files_per_dir => 2,
214
+ :subdirs_per_dir => 1,
215
+ }) do |dir|
216
+ f = Findler.new(dir)
217
+ f.add_filters :order_by_name, :directories_first
218
+ expected = %W(dir-0/tmp-0.a dir-0/tmp-1.a tmp-0.a tmp-1.a)
219
+ collect_files(f.iterator).must_equal expected
220
+ f.add_filter :reverse
221
+ collect_files(f.iterator).must_equal expected.reverse
222
+ end
223
+ end
224
+ end
@@ -0,0 +1,83 @@
1
+ require 'minitest/spec'
2
+ require 'minitest/reporters'
3
+ require 'minitest/autorun'
4
+ require 'tmpdir'
5
+ require 'fileutils'
6
+ require 'findler'
7
+
8
+ MiniTest::Unit.runner = MiniTest::SuiteRunner.new
9
+ if ENV["RM_INFO"] || ENV["TEAMCITY_VERSION"]
10
+ MiniTest::Unit.runner.reporters << MiniTest::Reporters::RubyMineReporter.new
11
+ elsif ENV['TM_PID']
12
+ MiniTest::Unit.runner.reporters << MiniTest::Reporters::RubyMateReporter.new
13
+ else
14
+ MiniTest::Unit.runner.reporters << MiniTest::Reporters::ProgressReporter.new
15
+ end
16
+
17
+ def with_tmp_dir(&block)
18
+ cwd = Dir.pwd
19
+ Dir.mktmpdir do |dir|
20
+ Dir.chdir(dir)
21
+ yield(Pathname.new dir)
22
+ Dir.chdir(cwd) # jruby needs us to cd out of the tmpdir so it can remove it
23
+ end
24
+ ensure
25
+ Dir.chdir(cwd)
26
+ end
27
+
28
+ def with_tree(sufficies, options = {}, &block)
29
+ with_tmp_dir do |dir|
30
+ sufficies.each { |suffix| mk_tree dir, options.merge(:suffix => suffix) }
31
+ yield(dir)
32
+ end
33
+ end
34
+
35
+ def mk_tree(target_dir, options)
36
+ opts = {
37
+ :depth => 3,
38
+ :files_per_dir => 3,
39
+ :subdirs_per_dir => 3,
40
+ :prefix => "tmp",
41
+ :suffix => "",
42
+ :dir_prefix => "dir",
43
+ :dir_suffix => ""
44
+ }.merge options
45
+ p = target_dir.is_a?(Pathname) ? target_dir : Pathname.new(target_dir)
46
+ p.mkdir unless p.exist?
47
+
48
+ opts[:files_per_dir].times do |i|
49
+ fname = "#{opts[:prefix]}-#{i}#{opts[:suffix]}"
50
+ FileUtils.touch(p + fname).to_s
51
+ end
52
+ return if (opts[:depth] -= 1) <= 0
53
+ opts[:subdirs_per_dir].times do |i|
54
+ dir = "#{opts[:dir_prefix]}-#{i}#{opts[:dir_suffix]}"
55
+ mk_tree(p + dir, opts)
56
+ end
57
+ end
58
+
59
+ def expected_files(depth, files_per_dir, subdirs_per_dir)
60
+ return 0 if depth == 0
61
+ files_per_dir + (subdirs_per_dir * expected_files(depth - 1, files_per_dir, subdirs_per_dir))
62
+ end
63
+
64
+ def relative_path(parent, pathname)
65
+ pathname.relative_path_from(parent).to_s
66
+ end
67
+
68
+ def collect_files(iter)
69
+ files = []
70
+ while nxt = iter.next_file
71
+ files << relative_path(iter.path, nxt)
72
+ end
73
+ files
74
+ end
75
+
76
+ def fs_case_sensitive?
77
+ @fs_case_sensitive ||= begin
78
+ `touch CASETEST`
79
+ !File.exist?('casetest')
80
+ ensure
81
+ `rm CASETEST`
82
+ end
83
+ end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: findler
3
3
  version: !ruby/object:Gem::Version
4
- hash: 25
4
+ hash: 23
5
5
  prerelease:
6
6
  segments:
7
7
  - 0
8
8
  - 0
9
- - 3
10
- version: 0.0.3
9
+ - 4
10
+ version: 0.0.4
11
11
  platform: ruby
12
12
  authors:
13
13
  - Matthew McEachen
@@ -15,10 +15,10 @@ autorequire:
15
15
  bindir: bin
16
16
  cert_chain: []
17
17
 
18
- date: 2012-01-23 00:00:00 Z
18
+ date: 2012-03-28 00:00:00 -07:00
19
+ default_executable:
19
20
  dependencies:
20
21
  - !ruby/object:Gem::Dependency
21
- prerelease: false
22
22
  type: :development
23
23
  requirement: &id001 !ruby/object:Gem::Requirement
24
24
  none: false
@@ -29,10 +29,10 @@ dependencies:
29
29
  segments:
30
30
  - 0
31
31
  version: "0"
32
+ prerelease: false
32
33
  name: rake
33
34
  version_requirements: *id001
34
35
  - !ruby/object:Gem::Dependency
35
- prerelease: false
36
36
  type: :development
37
37
  requirement: &id002 !ruby/object:Gem::Requirement
38
38
  none: false
@@ -43,28 +43,40 @@ dependencies:
43
43
  segments:
44
44
  - 0
45
45
  version: "0"
46
+ prerelease: false
46
47
  name: yard
47
48
  version_requirements: *id002
48
49
  - !ruby/object:Gem::Dependency
49
- prerelease: false
50
50
  type: :development
51
51
  requirement: &id003 !ruby/object:Gem::Requirement
52
52
  none: false
53
53
  requirements:
54
- - - ~>
54
+ - - ">="
55
55
  - !ruby/object:Gem::Version
56
- hash: 19
56
+ hash: 3
57
57
  segments:
58
- - 2
59
- - 7
60
58
  - 0
61
- version: 2.7.0
62
- name: rspec
59
+ version: "0"
60
+ prerelease: false
61
+ name: minitest
63
62
  version_requirements: *id003
64
63
  - !ruby/object:Gem::Dependency
64
+ type: :development
65
+ requirement: &id004 !ruby/object:Gem::Requirement
66
+ none: false
67
+ requirements:
68
+ - - ">="
69
+ - !ruby/object:Gem::Version
70
+ hash: 3
71
+ segments:
72
+ - 0
73
+ version: "0"
65
74
  prerelease: false
75
+ name: minitest-reporters
76
+ version_requirements: *id004
77
+ - !ruby/object:Gem::Dependency
66
78
  type: :runtime
67
- requirement: &id004 !ruby/object:Gem::Requirement
79
+ requirement: &id005 !ruby/object:Gem::Requirement
68
80
  none: false
69
81
  requirements:
70
82
  - - ">="
@@ -73,8 +85,9 @@ dependencies:
73
85
  segments:
74
86
  - 0
75
87
  version: "0"
88
+ prerelease: false
76
89
  name: bloomer
77
- version_requirements: *id004
90
+ version_requirements: *id005
78
91
  description: |-
79
92
  Findler is designed for very large filesystem hierarchies,
80
93
  where simple block processing, or returning an array of matches, just isn't feasible.
@@ -89,16 +102,22 @@ extra_rdoc_files: []
89
102
 
90
103
  files:
91
104
  - .gitignore
105
+ - .travis.yml
92
106
  - Gemfile
93
107
  - MIT-LICENSE
94
108
  - README.md
95
109
  - Rakefile
110
+ - examples/find_exif_files
96
111
  - findler.gemspec
97
112
  - lib/findler.rb
113
+ - lib/findler/error.rb
114
+ - lib/findler/filters.rb
98
115
  - lib/findler/iterator.rb
99
116
  - lib/findler/version.rb
100
- - spec/findler_spec.rb
101
- - spec/spec_helper.rb
117
+ - test/filters_test.rb
118
+ - test/findler_test.rb
119
+ - test/test_helper.rb
120
+ has_rdoc: true
102
121
  homepage: https://github.com/mceachen/findler/
103
122
  licenses: []
104
123
 
@@ -128,11 +147,11 @@ required_rubygems_version: !ruby/object:Gem::Requirement
128
147
  requirements: []
129
148
 
130
149
  rubyforge_project: findler
131
- rubygems_version: 1.8.15
150
+ rubygems_version: 1.6.2
132
151
  signing_key:
133
152
  specification_version: 3
134
153
  summary: Findler is a stateful filesystem iterator
135
154
  test_files:
136
- - spec/findler_spec.rb
137
- - spec/spec_helper.rb
138
- has_rdoc:
155
+ - test/filters_test.rb
156
+ - test/findler_test.rb
157
+ - test/test_helper.rb
data/spec/findler_spec.rb DELETED
@@ -1,117 +0,0 @@
1
- require 'spec_helper'
2
-
3
- describe Findler do
4
-
5
- before :each do
6
- @opts = {
7
- :depth => 3,
8
- :files_per_dir => 3,
9
- :subdirs_per_dir => 3,
10
- :prefix => "tmp",
11
- :suffix => "",
12
- :dir_prefix => "dir",
13
- :dir_suffix => ""
14
- }
15
- end
16
-
17
- it "should find all files by default" do
18
- with_tree([".jpg", ".txt"]) do |dir|
19
- f = Findler.new(dir)
20
- iter = f.iterator
21
- collect_files(iter).should =~ `find * -type f`.split
22
- end
23
- end
24
-
25
- it "should find only .jpg files when constrained" do
26
- with_tree([".jpg", ".txt", ".JPG"]) do |dir|
27
- f = Findler.new(dir)
28
- f.append_extension ".jpg"
29
- iter = f.iterator
30
- collect_files(iter).should =~ `find * -type f -name \\*.jpg`.split
31
- end
32
- end
33
-
34
- it "should find .jpg or .JPG files when constrained" do
35
- with_tree([".jpg", ".txt", ".JPG"]) do |dir|
36
- f = Findler.new(dir)
37
- f.append_extension ".jpg"
38
- f.case_insensitive!
39
- iter = f.iterator
40
- collect_files(iter).should =~ `find * -type f -iname \\*.jpg`.split
41
- end
42
- end
43
-
44
- it "should find files added after iteration started" do
45
- with_tree([".txt"]) do |dir|
46
- f = Findler.new(dir)
47
- iter = f.iterator
48
- iter.next.should_not be_nil
49
-
50
- # cheating with mtime on the touch doesn't properly update the parent directory ctime,
51
- # so we have to deal with the second-granularity resolution of the filesystem.
52
- sleep(1.1)
53
-
54
- FileUtils.touch(dir + "new.txt")
55
- collect_files(iter).should include("new.txt")
56
- end
57
- end
58
-
59
- it "should find new files after a rescan" do
60
- with_tree([".txt", ".no"]) do |dir|
61
- f = Findler.new(dir)
62
- f.append_extension ".txt"
63
- iter = f.iterator
64
- collect_files(iter).should =~ `find * -type f -iname \\*.txt`.split
65
- FileUtils.touch(dir + "dir-0" + "dir-1" + "new-0.txt")
66
- FileUtils.touch(dir + "dir-1" + "dir-0" + "new-1.txt")
67
- FileUtils.touch(dir + "dir-2" + "dir-2" + "new-2.txt")
68
- collect_files(iter).should be_empty
69
- iter.rescan!
70
- collect_files(iter).should =~ ["dir-0/dir-1/new-0.txt", "dir-1/dir-0/new-1.txt", "dir-2/dir-2/new-2.txt"]
71
- end
72
- end
73
-
74
- it "should not return files removed after iteration started" do
75
- with_tree([".txt"]) do |dir|
76
- f = Findler.new(dir)
77
- iter = f.iterator
78
- iter.next.should_not be_nil
79
- sleep(1.1) # see above for hand-wringing-defense of this atrocity
80
-
81
- (dir + "tmp-1.txt").unlink
82
- collect_files(iter).should_not include("tmp-1.txt")
83
- end
84
- end
85
-
86
- it "should dump/load in the middle of iterating" do
87
- with_tree([".jpg", ".txt", ".JPG"]) do |dir|
88
- all_files = `find * -type f -iname \\*.jpg`.split
89
- all_files.size.times do |i|
90
- f = Findler.new(dir)
91
- f.append_extension ".jpg"
92
- f.case_insensitive!
93
- iter_a = f.iterator
94
- files_a = i.times.collect { relative_path(iter_a, iter_a.next) }
95
- iter_b = Marshal.load(Marshal.dump(iter_a))
96
- files_b = collect_files(iter_b)
97
-
98
- iter_c = Marshal.load(Marshal.dump(iter_b))
99
- collect_files(iter_c)
100
- iter_c.next.should be_nil
101
-
102
- (files_a + files_b).should =~ all_files
103
- end
104
- end
105
- end
106
-
107
- it "should create an iterator even for a non-existent directory" do
108
- tmpdir = nil
109
- cwd = Dir.pwd
110
- Dir.mktmpdir do |dir|
111
- tmpdir = Pathname.new dir
112
- end
113
- tmpdir.should_not exist
114
- f = Findler.new(tmpdir)
115
- collect_files(f.iterator).should be_empty
116
- end
117
- end
data/spec/spec_helper.rb DELETED
@@ -1,66 +0,0 @@
1
- require 'rspec'
2
- require 'tmpdir'
3
- require 'fileutils'
4
- require 'findler'
5
-
6
- RSpec.configure do |config|
7
- config.color_enabled = true
8
- config.formatter = 'documentation'
9
- end
10
-
11
- def with_tmp_dir(&block)
12
- cwd = Dir.pwd
13
- Dir.mktmpdir do |dir|
14
- Dir.chdir(dir)
15
- yield(Pathname.new dir)
16
- end
17
- Dir.chdir(cwd)
18
- end
19
-
20
- def with_tree(sufficies, &block)
21
- with_tmp_dir do |dir|
22
- sufficies.each { |suffix| mk_tree dir, @opts.merge(:suffix => suffix) }
23
- yield(dir)
24
- end
25
- end
26
-
27
- def mk_tree(target_dir, options)
28
- opts = {
29
- :depth => 3,
30
- :files_per_dir => 3,
31
- :subdirs_per_dir => 3,
32
- :prefix => "tmp",
33
- :suffix => "",
34
- :dir_prefix => "dir",
35
- :dir_suffix => ""
36
- }.merge options
37
- p = target_dir.is_a?(Pathname) ? target_dir : Pathname.new(target_dir)
38
- p.mkdir unless p.exist?
39
-
40
- opts[:files_per_dir].times do |i|
41
- fname = "#{opts[:prefix]}-#{i}#{opts[:suffix]}"
42
- FileUtils.touch(p + fname).to_s
43
- end
44
- return if (opts[:depth] -= 1) <= 0
45
- opts[:subdirs_per_dir].times do |i|
46
- dir = "#{opts[:dir_prefix]}-#{i}#{opts[:dir_suffix]}"
47
- mk_tree(p + dir, opts)
48
- end
49
- end
50
-
51
- def expected_files(depth, files_per_dir, subdirs_per_dir)
52
- return 0 if depth == 0
53
- files_per_dir + (subdirs_per_dir * expected_files(depth - 1, files_per_dir, subdirs_per_dir))
54
- end
55
-
56
- def relative_path(parent, pathname)
57
- pathname.relative_path_from(parent.path).to_s
58
- end
59
-
60
- def collect_files(iter)
61
- files = []
62
- while nxt = iter.next
63
- files << relative_path(iter, nxt)
64
- end
65
- files
66
- end