libis-rosetta_checker 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.4.0
5
+ before_install: gem install bundler -v 1.16.0
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at kris.dekeyser@libis.be. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in libis-rosetta_checker.gemspec
6
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2017 LIBIS (Kris Dekeyser)
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,43 @@
1
+ # Libis::RosettaChecker
2
+
3
+ Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/libis/rosetta_checker`. To experiment with that code, run `bin/console` for an interactive prompt.
4
+
5
+ TODO: Delete this and the text above, and describe your gem
6
+
7
+ ## Installation
8
+
9
+ Add this line to your application's Gemfile:
10
+
11
+ ```ruby
12
+ gem 'libis-rosetta_checker'
13
+ ```
14
+
15
+ And then execute:
16
+
17
+ $ bundle
18
+
19
+ Or install it yourself as:
20
+
21
+ $ gem install libis-rosetta_checker
22
+
23
+ ## Usage
24
+
25
+ TODO: Write usage instructions here
26
+
27
+ ## Development
28
+
29
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
30
+
31
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
32
+
33
+ ## Contributing
34
+
35
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/libis-rosetta_checker. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
36
+
37
+ ## License
38
+
39
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
40
+
41
+ ## Code of Conduct
42
+
43
+ Everyone interacting in the Libis::RosettaChecker project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/libis-rosetta_checker/blob/master/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "libis/rosetta_checker"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "libis/rosetta_checker"
5
+
6
+ Libis::RosettaChecker.run
@@ -0,0 +1,277 @@
1
+ require 'optionparser'
2
+ require 'digest'
3
+ require 'bzip2/ffi'
4
+ require 'zip'
5
+ require 'oci8'
6
+ require 'logging'
7
+ require 'pathname'
8
+ require 'csv'
9
+
10
+ require_relative 'sub_command'
11
+ require_relative 'options/files_to_ingest_cleanup'
12
+
13
+ module Libis
14
+ module RosettaChecker
15
+ class FilesToIngestCleanup < SubCommand
16
+
17
+ def self.short_desc
18
+ 'Report on files that are/are not ingested'.freeze
19
+ end
20
+
21
+ def self.command
22
+ 'files2ingest'.freeze
23
+ end
24
+
25
+ def self.options_class
26
+ FilesToIngestCleanupOptions
27
+ end
28
+
29
+ def self.run
30
+ super do |cfg|
31
+ self.new(cfg).run(ARGV)
32
+ end
33
+ end
34
+
35
+ attr_accessor :cfg, :logger, :connection, :cursor, :report
36
+
37
+ def initialize(cfg)
38
+ @cfg = cfg
39
+
40
+ setup_logging
41
+
42
+ setup_db
43
+ end
44
+
45
+ def finalize
46
+ cursor.close if cursor
47
+ connection.logoff if connection
48
+ end
49
+
50
+ def run(argv)
51
+ raise ArgumentError, 'Need to specify at least a directory/file to parse' unless argv.size > 0
52
+ while (dir = argv.shift)
53
+ process_dir dir
54
+ next if argv.empty?
55
+ self.class.parse_options(argv)
56
+ setup_logging
57
+ end
58
+ end
59
+
60
+ protected
61
+
62
+ SQL_DATA = %w'ie_id rep_id fl_id original_name owner label group_id entity_type user_c'
63
+ CSV_HEADER = %w'parent_type parent file size md5 found name_match' + SQL_DATA
64
+
65
+ LOG_PATTERN = "[%d #%p] %-5l : %m\n".freeze
66
+
67
+ MSG_CALC_FC = ' - Calculating filesize and checksum'.freeze
68
+ MSG_CHCK_DB = ' - Checking file in DB'.freeze
69
+ MSG_DEFLATE = ' - Deflating'.freeze
70
+
71
+ FIND_SQL = <<-SQL
72
+ SELECT
73
+ CONCAT(sp.VALUE, ps.INDEX_LOCATION) as path,
74
+ ps.FILE_SIZE as filesize,
75
+ ps.CHECK_SUM_TYPE as checksum_type,
76
+ ps.CHECK_SUM as checksum,
77
+ sr.FILEORIGINALNAME as original_name,
78
+ ps.STORED_ENTITY_ID as fl_id,
79
+ cr.PID as rep_id,
80
+ ci.PID as ie_id,
81
+ ci.OWNER as owner,
82
+ ci.LABEL as label,
83
+ cf.GROUPID as group_id,
84
+ ci.ENTITYTYPE as entity_type,
85
+ ci.PARTITIONC as user_c
86
+ FROM
87
+ V2KU_PER00.PERMANENT_INDEX ps
88
+ LEFT JOIN V2KU_SHR00.STORAGE_PARAMETER sp ON sp.STORAGE_ID = ps.STORAGE_ID
89
+ LEFT JOIN V2KU_REP00.HDESTREAMREF sr ON sr.PID = ps.STORED_ENTITY_ID
90
+ LEFT JOIN V2KU_REP00.HDECONTROL cf ON cf.PID = ps.STORED_ENTITY_ID
91
+ LEFT JOIN V2KU_REP00.HDECONTROL cr ON cr.PID = cf.PARENTID
92
+ LEFT JOIN V2KU_REP00.HDECONTROL ci ON ci.PID = cr.PARENTID
93
+ WHERE
94
+ sp."KEY" = 'DIR_ROOT'
95
+ AND ps.FILE_SIZE = :filesize
96
+ AND ps.CHECK_SUM = :checksum
97
+ AND ps.CHECK_SUM_TYPE = 'MD5'
98
+ AND cf.OBJECTTYPE = 'FILE'
99
+ AND cr.OBJECTTYPE = 'REPRESENTATION'
100
+ AND ci.OBJECTTYPE = 'INTELLECTUAL_ENTITY'
101
+ SQL
102
+
103
+ def setup_logging
104
+ Logging.logger.root.level = :info
105
+ @logger = Logging.logger[self.class.command]
106
+ @logger.appenders = [Logging.appenders.stdout]
107
+ Logging.appenders.stdout.level = (@cfg.quiet ? :warn : :info)
108
+ @cfg.log_file = nil if @cfg.log_file&.chomp&.strip&.empty?
109
+ @logger.add_appenders Logging.appenders.file(
110
+ @cfg.log_file,
111
+ truncate: false,
112
+ layout: Logging.layouts.pattern(pattern: LOG_PATTERN)
113
+ ) if @cfg.log_file
114
+ end
115
+
116
+ def setup_db
117
+ @connection = OCI8.new(@cfg.dbuser, @cfg.dbpass, @cfg.dburl)
118
+ @cursor = @connection.parse(FIND_SQL)
119
+ @cursor.prefetch_rows = 10
120
+ end
121
+
122
+ def process_dir(dir)
123
+ if File.directory?(dir)
124
+ unless Dir.exist?(dir)
125
+ logger.error "Directory '#{dir}' does not exist"
126
+ return nil
127
+ end
128
+ unless File.readable?(dir)
129
+ logger.error "Directory '#{dir}' cannot be read"
130
+ return nil
131
+ end
132
+ logger.info "Processing dir '#{dir}'"
133
+ Dir.entries(dir).each do |entry|
134
+ next if %w'. ..'.include? entry
135
+ path = File.join dir, entry
136
+ begin
137
+ process_dir path if @cfg.recursive
138
+ next
139
+ end if File.directory?(path)
140
+ process_file path
141
+ end
142
+ elsif dir[0] == '@'
143
+ file = to_file(dir[1..-1])
144
+ return nil unless file
145
+ logger.info "Processing input file '#{file}'"
146
+ File.open(file, 'r').each_line do |line|
147
+ process_file(line.chomp, File.dirname(file))
148
+ end
149
+ elsif File.file?(dir)
150
+ process_file(dir)
151
+ else
152
+ raise ArgumentError, "Argument '#{dir}' should refer to an existing and readable file or directory"
153
+ end
154
+ end
155
+
156
+ def to_file(file, *search_dirs)
157
+ if File.exist?(file)
158
+ return file if File.readable?(file)
159
+ logger.error "File '#{file}' cannot be read"
160
+ return nil
161
+ end
162
+ search_dirs.each do |dir|
163
+ f = File.join(dir, file)
164
+ if File.exist? f
165
+ return f if File.readable?(f)
166
+ logger.error "File '#{f}' cannot be read"
167
+ return nil
168
+ end
169
+ end
170
+ logger.error "File '#{file}' does not exist"
171
+ nil
172
+ end
173
+
174
+ def process_file(_file, *search_dir)
175
+ file = to_file(_file, *search_dir)
176
+ return unless file
177
+
178
+ if File.directory?(file)
179
+ process_dir(file)
180
+ return
181
+ end
182
+
183
+ info = {
184
+ parent_type: 'D',
185
+ parent: File.dirname(file),
186
+ file: File.basename(file)
187
+ }
188
+
189
+ logger.info "- #{file}"
190
+ if File.extname(file) == '.bz2'
191
+ logger.info MSG_DEFLATE
192
+ info[:size] = 0
193
+ reader = Bzip2::FFI::Reader.open file
194
+ md5 = Digest::MD5.new
195
+ logger.info MSG_CALC_FC
196
+ while (data = reader.read 2048000) do
197
+ info[:size] += data.length
198
+ md5 << data
199
+ end
200
+ reader.close
201
+ info[:parent_type] = 'F'
202
+ info[:parent] = file
203
+ info[:file] = File.basename file, '.bz2'
204
+ info[:md5] = md5.hexdigest
205
+ check_file info
206
+ elsif File.extname(file) == '.zip'
207
+ logger.info ' - Unpacking'.freeze
208
+ info[:parent_type] = 'Z'
209
+ info[:parent] = file
210
+ Zip::File.open(file) do |zip|
211
+ zip.each do |entry|
212
+ next if entry.directory?
213
+ info[:file] = entry.name
214
+ logger.info "- #{file}/#{entry.name}"
215
+ logger.info MSG_CALC_FC
216
+ info[:size] = 0
217
+ md5 = Digest::MD5.new
218
+ reader = entry.get_input_stream
219
+ while (data = reader.read 2048000) do
220
+ info[:size] += data.length
221
+ md5 << data
222
+ end
223
+ reader.close
224
+ info[:md5] = md5.hexdigest
225
+ check_file info
226
+ end
227
+ end
228
+ else
229
+ begin
230
+ logger.info MSG_CALC_FC
231
+ info[:size] = File.size file
232
+ info[:md5] = Digest::MD5.file file
233
+ check_file info
234
+ rescue Exception
235
+ logger.error "Could not access file '#{file}'"
236
+ end
237
+ end
238
+ end
239
+
240
+ def check_file(info)
241
+ logger.info MSG_CHCK_DB
242
+
243
+ cursor.bind_param(':filesize', info[:size].to_i)
244
+ cursor.bind_param(':checksum', info[:md5].to_s)
245
+
246
+ cursor.exec
247
+
248
+ while (found = cursor.fetch_hash)
249
+ SQL_DATA.each {|x| info[x.to_sym] = found[x.upcase]}
250
+ logger.info " found match: #{info[:ie_id]}/#{info[:rep_id]}/#{info[:fl_id]}"
251
+ if info[:original_name] =~ Regexp.new(info[:file].split(/[ #._-]/).join('.*'))
252
+ logger.info " name matches: #{info[:original_name]}"
253
+ info[:name_match] = true
254
+ else
255
+ info[:name_match] = false
256
+ end
257
+
258
+ end
259
+
260
+ info[:found] = cursor.row_count
261
+
262
+ to_report(info)
263
+
264
+ end
265
+
266
+ def to_report(info = nil)
267
+ return unless @cfg.report
268
+ unless @report
269
+ @report ||= CSV.open(@cfg.report_file, 'wb')
270
+ @report << CSV_HEADER
271
+ end
272
+ @report << CSV_HEADER.map {|x| info[x.to_sym]} if info
273
+ end
274
+
275
+ end
276
+ end
277
+ end