file_scanner 1.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +9 -0
- data/.travis.yml +8 -0
- data/Gemfile +4 -0
- data/README.md +112 -0
- data/Rakefile +10 -0
- data/bin/console +14 -0
- data/bin/setup +8 -0
- data/file_scanner.gemspec +20 -0
- data/lib/file_scanner/filters.rb +31 -0
- data/lib/file_scanner/loader.rb +21 -0
- data/lib/file_scanner/version.rb +3 -0
- data/lib/file_scanner/worker.rb +42 -0
- data/lib/file_scanner.rb +4 -0
- metadata +100 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 6fb33515de67d99ac734334fb573a11424861444
|
4
|
+
data.tar.gz: 7e6742c59f1f0939114ee6ba81d49be590a74447
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 639bfe4e4078486bd18fb49762154e8465766187b5be9038e61494c10fa9ff7273f68d1c48abffd89884d81918d0590ebebf43c1c1921d8e59ce5369e6cf3801
|
7
|
+
data.tar.gz: c7256af0baef400aaed7b302728cb5f1ccd2df9cb203a02a02a4491bb7d14ca4a4ecd936cbb1f89ccaa77f832b5cf6364c7c4658d4249212e2dc814691d03883
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,112 @@
|
|
1
|
+
## Table of Contents
|
2
|
+
|
3
|
+
* [Scope](#scope)
|
4
|
+
* [Motivation](#motivation)
|
5
|
+
* [Installation](#installation)
|
6
|
+
* [Usage](#usage)
|
7
|
+
* [Loader](#loader)
|
8
|
+
* [Filters](#filters)
|
9
|
+
* [Policies](#policies)
|
10
|
+
* [Worker](#worker)
|
11
|
+
|
12
|
+
## Scope
|
13
|
+
This gem is aimed to collect a set of file paths starting by a wildcard rule, filter them by default/custom filters (access time, size range) and apply a set of custom policies to them.
|
14
|
+
|
15
|
+
## Motivation
|
16
|
+
This gem is helpful to purge obsolete files or to promote relevant ones, by calling external services (CDN APIs) and/or local file system actions (copy, move, delete, etc).
|
17
|
+
|
18
|
+
## Installation
|
19
|
+
Add this line to your application's Gemfile:
|
20
|
+
```ruby
|
21
|
+
gem "file_scanner"
|
22
|
+
```
|
23
|
+
|
24
|
+
And then execute:
|
25
|
+
```shell
|
26
|
+
bundle
|
27
|
+
```
|
28
|
+
|
29
|
+
Or install it yourself as:
|
30
|
+
```shell
|
31
|
+
gem install file_scanner
|
32
|
+
```
|
33
|
+
|
34
|
+
## Usage
|
35
|
+
|
36
|
+
### Loader
|
37
|
+
The first step is to create a `Loader` instance by specifying the path where the files need to be scanned with optional extensions list:
|
38
|
+
```ruby
|
39
|
+
require "file_scanner"
|
40
|
+
|
41
|
+
loader = FileScanner::Loader.new(path: ENV["HOME"], extensions: %w[html txt])
|
42
|
+
```
|
43
|
+
|
44
|
+
### Filters
|
45
|
+
The second step is to provide the filters list to select files for which the `call` method is truthy.
|
46
|
+
|
47
|
+
#### Default
|
48
|
+
You can rely on existing filters that select files by:
|
49
|
+
* checking if file is older than *30 days*
|
50
|
+
* checking if file size is *smaller than 100 bytes*
|
51
|
+
|
52
|
+
You can configure default behaviour by passing different arguments:
|
53
|
+
```ruby
|
54
|
+
accessed_a_week_ago = FileScanner::Filters::LastAccess.new(Time.now-7*24*3600)
|
55
|
+
one_to_two_mega = FileScanner::Filters::SizeRange.new(min: 1024**2, max: 2*1024**2)
|
56
|
+
|
57
|
+
filters = []
|
58
|
+
filters << accessed_a_week_ago
|
59
|
+
filters << one_to_two_mega
|
60
|
+
```
|
61
|
+
|
62
|
+
#### Custom
|
63
|
+
It is convenient to create custom filters by just relying on `Proc` instances that satisfy the `callable` protocol:
|
64
|
+
```ruby
|
65
|
+
filters << ->(file) { File.directory?(file) }
|
66
|
+
```
|
67
|
+
|
68
|
+
### Policies
|
69
|
+
The third step is creating custom policies objects (no default exist) to be applied to the list of filtered paths.
|
70
|
+
Again, it suffice the policy responds to the `call` method and accept an array of paths as an argument:
|
71
|
+
```ruby
|
72
|
+
require "fileutils"
|
73
|
+
|
74
|
+
remove_from_disk = ->(paths) do
|
75
|
+
FileUtils.rm_rf(paths)
|
76
|
+
end
|
77
|
+
|
78
|
+
policies = []
|
79
|
+
policies << remove_from_disk
|
80
|
+
```
|
81
|
+
|
82
|
+
### Worker
|
83
|
+
Now that you have all of the collaborators in place, you can create the `Worker` instance:
|
84
|
+
```ruby
|
85
|
+
worker = FileScanner::Worker.new(loader: loader, filters: filters, policies: policies)
|
86
|
+
worker.call # apply all the specified policies to the filtered file paths
|
87
|
+
```
|
88
|
+
|
89
|
+
#### Slice of files
|
90
|
+
In case you are going to scan a large number of files, is better to work in batches.
|
91
|
+
This is exactly why the `Worker` class accept a `slice` attribute to distribute the work and avoid saturating the resources used by the specified policies:
|
92
|
+
```ruby
|
93
|
+
worker = FileScanner::Worker.new(loader: loader, filter: filter, policies: policies, slice: 1000)
|
94
|
+
worker.call # call policies by slice of 1000 files
|
95
|
+
```
|
96
|
+
|
97
|
+
#### Policies by block
|
98
|
+
In case you prefer to specify the policies as a block yielding the files slice, you can omit the `policies` argument at all:
|
99
|
+
```ruby
|
100
|
+
worker = FileScanner::Worker.new(loader: loader, filter: filter)
|
101
|
+
worker.call do |slice|
|
102
|
+
->(slice) { FileUtils.chmod_R(0700, slice) }.call
|
103
|
+
end
|
104
|
+
```
|
105
|
+
|
106
|
+
#### Use a logger
|
107
|
+
If you dare to trace what the worker is doing (including errors), you can specify a logger to the worker class:
|
108
|
+
```ruby
|
109
|
+
my_logger = Logger.new("my_file.log")
|
110
|
+
worker = FileScanner::Worker.new(loader: loader, filter: filter, logger: my_logger)
|
111
|
+
worker.call # will log worker actions to my_file.log
|
112
|
+
```
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "file_scanner"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start(__FILE__)
|
data/bin/setup
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path("../lib", __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require "file_scanner/version"
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = "file_scanner"
|
8
|
+
s.version = FileScanner::VERSION
|
9
|
+
s.authors = ["costajob"]
|
10
|
+
s.email = ["costajob@gmail.com"]
|
11
|
+
s.summary = "A scanner routine that collect file paths basing on specified filters and apply to them a set of custom policies"
|
12
|
+
s.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(spec|test|s|features)/}) }
|
13
|
+
s.require_paths = ["lib"]
|
14
|
+
s.license = "MIT"
|
15
|
+
s.required_ruby_version = ">= 2.1.2"
|
16
|
+
|
17
|
+
s.add_development_dependency "bundler", "~> 1.15"
|
18
|
+
s.add_development_dependency "rake", "~> 10.0"
|
19
|
+
s.add_development_dependency "minitest", "~> 5.0"
|
20
|
+
end
|
@@ -0,0 +1,31 @@
|
|
1
|
+
module FileScanner
|
2
|
+
module Filters
|
3
|
+
def self.defaults
|
4
|
+
constants.map do |name|
|
5
|
+
self.const_get(name).new
|
6
|
+
end
|
7
|
+
end
|
8
|
+
|
9
|
+
class LastAccess
|
10
|
+
DAY = 3600*24
|
11
|
+
|
12
|
+
def initialize(atime = Time.now-30*DAY)
|
13
|
+
@atime = atime
|
14
|
+
end
|
15
|
+
|
16
|
+
def call(file)
|
17
|
+
@atime >= File.atime(file)
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
class SizeRange
|
22
|
+
def initialize(min: 100, max: Float::INFINITY)
|
23
|
+
@range = min..max
|
24
|
+
end
|
25
|
+
|
26
|
+
def call(file)
|
27
|
+
@range === File.size(file)
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
module FileScanner
|
2
|
+
class Loader
|
3
|
+
def initialize(path:, extensions: [])
|
4
|
+
@path = path
|
5
|
+
@extensions = extensions
|
6
|
+
end
|
7
|
+
|
8
|
+
def call
|
9
|
+
Dir.glob(files_path)
|
10
|
+
end
|
11
|
+
|
12
|
+
private def files_path
|
13
|
+
File.join(@path, "**", extensions_path)
|
14
|
+
end
|
15
|
+
|
16
|
+
private def extensions_path
|
17
|
+
return "*" if @extensions.empty?
|
18
|
+
"*.{#{@extensions.join(",")}}"
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
@@ -0,0 +1,42 @@
|
|
1
|
+
require "logger"
|
2
|
+
|
3
|
+
module FileScanner
|
4
|
+
class Worker
|
5
|
+
attr_reader :filters, :policies
|
6
|
+
|
7
|
+
def initialize(loader:, filters: Filters::defaults, policies: [], logger: Logger.new(nil), slice: nil)
|
8
|
+
@loader = loader
|
9
|
+
@filters = filters
|
10
|
+
@policies = policies
|
11
|
+
@slice = slice.to_i
|
12
|
+
@logger = logger
|
13
|
+
end
|
14
|
+
|
15
|
+
def call
|
16
|
+
slices.each do |slice|
|
17
|
+
yield(slice) if block_given? && policies.empty?
|
18
|
+
policies.each do |policy|
|
19
|
+
@logger.info { "applying \e[1m#{policy}\e[0m to \e[1m#{slice.size}\e[0m files" }
|
20
|
+
policy.call(slice)
|
21
|
+
end
|
22
|
+
end
|
23
|
+
rescue StandardError => e
|
24
|
+
@logger.error { e.message }
|
25
|
+
raise e
|
26
|
+
end
|
27
|
+
|
28
|
+
private def files
|
29
|
+
@files ||= Array(@loader.call).select do |f|
|
30
|
+
@filters.all? do |filter|
|
31
|
+
@logger.info { "applying \e[1m#{filter.class}\w[0m to \e[1m#{File.basename(f)}\e[0m" }
|
32
|
+
filter.call(f)
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
private def slices
|
38
|
+
return [files] if @slice.zero?
|
39
|
+
files.each_slice(@slice)
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
data/lib/file_scanner.rb
ADDED
metadata
ADDED
@@ -0,0 +1,100 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: file_scanner
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.3
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- costajob
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2017-08-08 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: bundler
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.15'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.15'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rake
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '10.0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '10.0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: minitest
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '5.0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - "~>"
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '5.0'
|
55
|
+
description:
|
56
|
+
email:
|
57
|
+
- costajob@gmail.com
|
58
|
+
executables: []
|
59
|
+
extensions: []
|
60
|
+
extra_rdoc_files: []
|
61
|
+
files:
|
62
|
+
- ".gitignore"
|
63
|
+
- ".travis.yml"
|
64
|
+
- Gemfile
|
65
|
+
- README.md
|
66
|
+
- Rakefile
|
67
|
+
- bin/console
|
68
|
+
- bin/setup
|
69
|
+
- file_scanner.gemspec
|
70
|
+
- lib/file_scanner.rb
|
71
|
+
- lib/file_scanner/filters.rb
|
72
|
+
- lib/file_scanner/loader.rb
|
73
|
+
- lib/file_scanner/version.rb
|
74
|
+
- lib/file_scanner/worker.rb
|
75
|
+
homepage:
|
76
|
+
licenses:
|
77
|
+
- MIT
|
78
|
+
metadata: {}
|
79
|
+
post_install_message:
|
80
|
+
rdoc_options: []
|
81
|
+
require_paths:
|
82
|
+
- lib
|
83
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
84
|
+
requirements:
|
85
|
+
- - ">="
|
86
|
+
- !ruby/object:Gem::Version
|
87
|
+
version: 2.1.2
|
88
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
89
|
+
requirements:
|
90
|
+
- - ">="
|
91
|
+
- !ruby/object:Gem::Version
|
92
|
+
version: '0'
|
93
|
+
requirements: []
|
94
|
+
rubyforge_project:
|
95
|
+
rubygems_version: 2.6.8
|
96
|
+
signing_key:
|
97
|
+
specification_version: 4
|
98
|
+
summary: A scanner routine that collect file paths basing on specified filters and
|
99
|
+
apply to them a set of custom policies
|
100
|
+
test_files: []
|