cdmdexer 0.17.7 → 0.21.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8d3dccb39ef4048b79af386f1d3696f9d7497efb3c53e388480e1505e027d99b
4
- data.tar.gz: 262051a8e4e246be6092a5cbc54245cbed91449c859f5c408bde2f7b8d9ae068
3
+ metadata.gz: ea1e0a2b54ce8d063a2d84d18987d662ef583abf044142356a46d5bcd85790f5
4
+ data.tar.gz: a9012a9ae4fcee9bdd37314a0acd5c19238c60691ff60a95668329c89e5c650b
5
5
  SHA512:
6
- metadata.gz: 41aeca2b754fc5681e5bfe120e4d690e4f56f5b767bd924f50a0363ca2fdfad2238e2c58507d61e79805c37e0218267903e662c968768c5d20fce807d98cecc9
7
- data.tar.gz: 2a99d19a2639e15b5811692101d9630a67296b9ee1a32bda4e1cf70be5dcf4a0aba8815602b9172b935b8bf1482531026f5ebc8ee5289439a4baee76a7b6873c
6
+ metadata.gz: bf5450e86279e1e3a16fc8aa862f69ac865047bb4a9dc1a067a547152437c9d849acc2cf78f1107b664d4b930c8140913bad5cb0d21a35ada2e720465427a187
7
+ data.tar.gz: 5ae4ef6681742ec8bd883a388d571c08057e58f23044b1c09278b305159add9aec4ddde656024c813e0afb7c4427bab06d81286874c4c7fe104a2a937134d40e
data/README.md CHANGED
@@ -116,19 +116,51 @@ You might also want to simply override some of the default mappings or add your
116
116
  ```ruby
117
117
  mappings = CDMDEXER::Transformer.default_mappings.merge(your_custom_field_mappings)
118
118
  ```
119
- ## A Custom Post-indexing Callback
120
119
 
121
- If you would like to perform some action (e.g. send an email) following the completion of the CDMDEXER indexing process, you may declare your own callback hook (anything with "Callback" in the class name declared within the CDMDEXER module space will be used). To do so in Rails, create a Rails initializer file `config/initializers/cdmdexer.rb`:
120
+ ### Callbacks
121
+
122
+ CDMDEXER comes with a set of lifecycle hooks that are called at various points during the ETL process. Downstream applications may want to take advantage of these in order to perform logging or notification tasks. Create a Rails initializer at `config/initializers/cdmdexer.rb` in order to take advantage of these hooks.
123
+
124
+ **IMPORTANT NOTE:** Errors (except for http timeouts) are **not raised** but are rather sent to the `CdmError` notification hook below. This prevents sidekiq from piling-up with errors that will never resolve via retries but still allows you to capture the error and be notified of error events.
125
+
126
+ E.g.:
122
127
 
123
128
  ```ruby
124
129
  module CDMDEXER
125
- class Callback
126
- def self.call!
127
- Rails.logger.info("My Custom CDMDEXER Callback")
130
+ class CompletedCallback
131
+ def self.call!(config)
132
+ # e.g. commit records - ::SolrClient.new.commit
133
+ Rails.logger.info "Processing last batch for: #{config['set_spec']}"
134
+ end
135
+ end
136
+
137
+ class OaiNotification
138
+ def self.call!(location)
139
+ Rails.logger.info "CDMDEXER: Requesting: #{location}"
140
+ end
141
+ end
142
+
143
+ class CdmNotification
144
+ def self.call!(collection, id, endpoint)
145
+ Rails.logger.info "CDMDEXER: Requesting: #{collection}:#{id}"
146
+ end
147
+ end
148
+
149
+ class LoaderNotification
150
+ def self.call!(ingestables, deletables)
151
+ Rails.logger.info "CDMDEXER: Loading #{ingestables.length} records and deleting #{deletables.length}"
152
+ end
153
+ end
154
+
155
+ class CdmError
156
+ def self.call!(error)
157
+ Rails.logger.info "CDMDEXER: #{error}"
158
+ # e.g. push error to a slack channel or send an email alert
128
159
  end
129
160
  end
130
161
  end
131
162
  ```
163
+
132
164
  ## Development
133
165
 
134
166
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -1,5 +1,6 @@
1
- # coding: utf-8
2
- lib = File.expand_path('../lib', __FILE__)
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path('lib', __dir__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require 'cdmdexer/version'
5
6
 
@@ -9,7 +10,7 @@ Gem::Specification.new do |spec|
9
10
  spec.authors = ['chadfennell']
10
11
  spec.email = ['fenne035@umn.edu']
11
12
 
12
- spec.summary = %q{Load CONTENTdm data into a Solr Index. CDMDEXER expects to run inside a Rails application.}
13
+ spec.summary = 'Load CONTENTdm data into a Solr Index. CDMDEXER expects to run inside a Rails application.'
13
14
  spec.license = 'MIT'
14
15
 
15
16
  spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
@@ -17,11 +18,11 @@ Gem::Specification.new do |spec|
17
18
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
18
19
  spec.require_paths = ['lib']
19
20
 
20
- spec.add_dependency 'hash_at_path', '~> 0.1'
21
21
  spec.add_dependency 'contentdm_api', '~> 0.5.0'
22
+ spec.add_dependency 'hash_at_path', '~> 0.1.6'
23
+ spec.add_dependency 'rsolr', '~> 2.0'
22
24
  spec.add_dependency 'sidekiq', '>= 3.5'
23
25
  spec.add_dependency 'titleize', '~> 1.4'
24
- spec.add_dependency 'rsolr', '~> 2.0'
25
26
  # CDMDEXER expects to run in a rails app, but just to avoid adding
26
27
  # another external dependency for XML procssing, we rely on activesupport's
27
28
  # Has.to_jsonl feature for testing and to allow this gem to function
@@ -29,7 +30,7 @@ Gem::Specification.new do |spec|
29
30
  spec.add_dependency 'rails', '>= 5.2'
30
31
 
31
32
  spec.add_development_dependency 'bundler', '~> 1.12'
32
- spec.add_development_dependency 'rake', '~> 12.0'
33
33
  spec.add_development_dependency 'minitest', '~> 5.0'
34
+ spec.add_development_dependency 'rake', '~> 12.0'
34
35
  spec.add_development_dependency 'yard', '~> 0.9.0'
35
36
  end
@@ -23,4 +23,6 @@ require 'cdmdexer/filtered_set_specs'
23
23
  require 'cdmdexer/etl_by_set_specs'
24
24
  require 'cdmdexer/regex_filter_callback'
25
25
  require 'cdmdexer/field_mapping'
26
- require 'cdmdexer/cdm_item'
26
+ require 'cdmdexer/cdm_item'
27
+ require 'cdmdexer/default_cdm_error'
28
+ require 'cdmdexer/transformation_error_message'
@@ -0,0 +1,8 @@
1
+ module CDMDEXER
2
+ # An example callback
3
+ class DefaultCdmError
4
+ def self.call!(error)
5
+ puts "CDMDEXER Error: #{error}"
6
+ end
7
+ end
8
+ end
@@ -38,11 +38,20 @@ module CDMDEXER
38
38
  @resumption_token = config.fetch('resumption_token', nil)
39
39
  @batch_size = config.fetch('batch_size', 5).to_i
40
40
  @is_recursive = config.fetch('is_recursive', true)
41
+ after_date = config.fetch('after_date', false)
41
42
 
42
43
  @oai_request = oai_request_klass.new(
43
44
  endpoint_url: oai_endpoint,
44
45
  resumption_token: resumption_token,
45
- set_spec: config.fetch('set_spec', nil)
46
+ set_spec: config.fetch('set_spec', nil),
47
+ # Optionally only select records that have been updated after a
48
+ # certain date. You may need to manually update a parent record
49
+ # after updating a child in order to signify to the indexer that
50
+ # some record in the parent's children has been updated. This indexer
51
+ # expects to only see parent records in the OAI responses.
52
+ # The default here is to skip indexing based on date.
53
+ # Rails example for getting a date: `after_date: 2.weeks.ago`
54
+ after_date: after_date
46
55
  )
47
56
 
48
57
  run_batch!
@@ -77,7 +86,7 @@ module CDMDEXER
77
86
  if next_resumption_token && is_recursive
78
87
  etl_worker_klass.perform_async(next_config)
79
88
  else
80
- completed_callback_klass.call!(solr_config)
89
+ completed_callback_klass.call!(config)
81
90
  end
82
91
  end
83
92
 
@@ -35,7 +35,7 @@ module CDMDEXER
35
35
  def transform_field
36
36
  formatter_klass.new(value: field_value, formatters: formatters).format!
37
37
  rescue StandardError => e
38
- raise "Mapping Error:#{field_mapping.config} Error:#{e.message}"
38
+ raise "Mapping: #{field_mapping.config} Error:#{e.message}"
39
39
  end
40
40
  end
41
41
  end
@@ -120,7 +120,7 @@ module CDMDEXER
120
120
 
121
121
  class AddSetSpecFormatter
122
122
  def self.format(value)
123
- value.merge('setSpec' => value['id'].split('/').first)
123
+ value.merge('setSpec' => value['id'].split(':').first)
124
124
  end
125
125
  end
126
126
 
@@ -10,6 +10,8 @@ module CDMDEXER
10
10
  hook(pattern: name.to_s, default: DefaultLoaderNotification)
11
11
  elsif name.to_s == 'CdmNotification'
12
12
  hook(pattern: name.to_s, default: DefaultCdmNotification)
13
+ elsif name.to_s == 'CdmError'
14
+ hook(pattern: name.to_s, default: DefaultCdmError)
13
15
  end
14
16
  end
15
17
 
@@ -1,5 +1,8 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'hash_at_path'
2
4
  require 'json'
5
+ require 'time'
3
6
 
4
7
  module CDMDEXER
5
8
  # Light wrapper around OAI requests
@@ -10,16 +13,19 @@ module CDMDEXER
10
13
  attr_reader :endpoint_url,
11
14
  :resumption_token,
12
15
  :client,
13
- :set_spec
16
+ :set_spec,
17
+ :after_date
14
18
 
15
19
  def initialize(endpoint_url: '',
16
20
  resumption_token: nil,
17
21
  set_spec: nil,
18
- client: Net::HTTP)
22
+ client: Net::HTTP,
23
+ after_date: false)
19
24
  @endpoint_url = endpoint_url
20
25
  @resumption_token = resumption_token
21
26
  @client = client
22
27
  @set_spec = set_spec ? "&set=#{set_spec}" : ''
28
+ @after_date = after_date
23
29
  end
24
30
 
25
31
  def records
@@ -44,12 +50,23 @@ module CDMDEXER
44
50
  end
45
51
 
46
52
  def deletable_ids
47
- records.select { |record| record['status'] == 'deleted' }
48
- .map { |record| record[:id] }
53
+ records.select do |record|
54
+ if record['status'] == 'deleted'
55
+ after_date ? Time.parse(record['datestamp']) >= after_date : true
56
+ end
57
+ end.map { |record| record[:id] }
49
58
  end
50
59
 
51
60
  def updatables
52
- records.reject { |record| record['status'] == 'deleted' }
61
+ records.reject do |record|
62
+ if record['status'] == 'deleted'
63
+ true
64
+ elsif after_date && Time.parse(record['datestamp']) < after_date
65
+ true
66
+ else
67
+ false
68
+ end
69
+ end
53
70
  end
54
71
 
55
72
  private
@@ -62,7 +79,7 @@ module CDMDEXER
62
79
  # Ensure results are a single level array
63
80
  # (single row sets, records, etc)
64
81
  def force_array(result)
65
- [result].flatten
82
+ [result].flatten.compact
66
83
  end
67
84
 
68
85
  def to_key(set)
@@ -1,22 +1,31 @@
1
1
  module CDMDEXER
2
+ # "Record Transformation Error: #{message}"
2
3
  class RecordTransformer
3
- attr_reader :record, :field_mappings, :field_transformer
4
+ attr_reader :record, :field_mappings, :field_transformer, :error_klass
4
5
  def initialize(record: {},
5
6
  field_mappings: [],
6
- field_transformer: FieldTransformer)
7
+ field_transformer: FieldTransformer,
8
+ error_klass: TransformationErrorMessage)
7
9
  @record = record
8
10
  @field_mappings = field_mappings
9
11
  @field_transformer = field_transformer
12
+ @error_klass = error_klass
10
13
  end
11
14
 
12
15
  def transform!
13
16
  field_mappings.inject({}) do |dest_record, field_mapping|
14
17
  dest_record.merge(transform_field(record, field_mapping))
15
18
  end
19
+ rescue StandardError => error
20
+ error_klass.new(message: message(error)).notify
16
21
  end
17
22
 
18
23
  private
19
24
 
25
+ def message(error)
26
+ "Record Transformation Error (Record #{record['id']}): #{error}"
27
+ end
28
+
20
29
  def transform_field(record, field_mapping)
21
30
  field_transformer.new(field_mapping: field_mapping,
22
31
  record: record).reduce
@@ -0,0 +1,23 @@
1
+ module CDMDEXER
2
+ # Raise anything but timeout errors or other http connection errors
3
+ # Notify downstream in case users want to log the non-timeout errors
4
+ class TransformationErrorMessage
5
+ attr_reader :message, :notification_klass
6
+ def initialize(message: :MISSING_ERROR_MESSAGE,
7
+ notification_klass: CDMDEXER::CdmError)
8
+ @notification_klass = notification_klass
9
+ @message = message
10
+ end
11
+
12
+ def notify
13
+ notification_klass.call! message
14
+ raise message if http_error?
15
+ end
16
+
17
+ private
18
+
19
+ def http_error?
20
+ !(message =~ /ConnectionError/).nil?
21
+ end
22
+ end
23
+ end
@@ -1,3 +1,3 @@
1
1
  module CDMDEXER
2
- VERSION = "0.17.7"
2
+ VERSION = "0.21.0"
3
3
  end
metadata CHANGED
@@ -1,85 +1,85 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cdmdexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.17.7
4
+ version: 0.21.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - chadfennell
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2019-03-13 00:00:00.000000000 Z
11
+ date: 2020-10-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: hash_at_path
14
+ name: contentdm_api
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '0.1'
19
+ version: 0.5.0
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '0.1'
26
+ version: 0.5.0
27
27
  - !ruby/object:Gem::Dependency
28
- name: contentdm_api
28
+ name: hash_at_path
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
31
  - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: 0.5.0
33
+ version: 0.1.6
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: 0.5.0
40
+ version: 0.1.6
41
41
  - !ruby/object:Gem::Dependency
42
- name: sidekiq
42
+ name: rsolr
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ">="
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '3.5'
47
+ version: '2.0'
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ">="
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '3.5'
54
+ version: '2.0'
55
55
  - !ruby/object:Gem::Dependency
56
- name: titleize
56
+ name: sidekiq
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - "~>"
59
+ - - ">="
60
60
  - !ruby/object:Gem::Version
61
- version: '1.4'
61
+ version: '3.5'
62
62
  type: :runtime
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - "~>"
66
+ - - ">="
67
67
  - !ruby/object:Gem::Version
68
- version: '1.4'
68
+ version: '3.5'
69
69
  - !ruby/object:Gem::Dependency
70
- name: rsolr
70
+ name: titleize
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
73
  - - "~>"
74
74
  - !ruby/object:Gem::Version
75
- version: '2.0'
75
+ version: '1.4'
76
76
  type: :runtime
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
- version: '2.0'
82
+ version: '1.4'
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: rails
85
85
  requirement: !ruby/object:Gem::Requirement
@@ -109,33 +109,33 @@ dependencies:
109
109
  - !ruby/object:Gem::Version
110
110
  version: '1.12'
111
111
  - !ruby/object:Gem::Dependency
112
- name: rake
112
+ name: minitest
113
113
  requirement: !ruby/object:Gem::Requirement
114
114
  requirements:
115
115
  - - "~>"
116
116
  - !ruby/object:Gem::Version
117
- version: '12.0'
117
+ version: '5.0'
118
118
  type: :development
119
119
  prerelease: false
120
120
  version_requirements: !ruby/object:Gem::Requirement
121
121
  requirements:
122
122
  - - "~>"
123
123
  - !ruby/object:Gem::Version
124
- version: '12.0'
124
+ version: '5.0'
125
125
  - !ruby/object:Gem::Dependency
126
- name: minitest
126
+ name: rake
127
127
  requirement: !ruby/object:Gem::Requirement
128
128
  requirements:
129
129
  - - "~>"
130
130
  - !ruby/object:Gem::Version
131
- version: '5.0'
131
+ version: '12.0'
132
132
  type: :development
133
133
  prerelease: false
134
134
  version_requirements: !ruby/object:Gem::Requirement
135
135
  requirements:
136
136
  - - "~>"
137
137
  - !ruby/object:Gem::Version
138
- version: '5.0'
138
+ version: '12.0'
139
139
  - !ruby/object:Gem::Dependency
140
140
  name: yard
141
141
  requirement: !ruby/object:Gem::Requirement
@@ -170,6 +170,7 @@ files:
170
170
  - cdmdexer.gemspec
171
171
  - lib/cdmdexer.rb
172
172
  - lib/cdmdexer/cdm_item.rb
173
+ - lib/cdmdexer/default_cdm_error.rb
173
174
  - lib/cdmdexer/default_cdm_notification.rb
174
175
  - lib/cdmdexer/default_completed_callback.rb
175
176
  - lib/cdmdexer/default_loader_notification.rb
@@ -193,6 +194,7 @@ files:
193
194
  - lib/cdmdexer/tasks/delete.rake
194
195
  - lib/cdmdexer/tasks/etl.rake
195
196
  - lib/cdmdexer/transform_worker.rb
197
+ - lib/cdmdexer/transformation_error_message.rb
196
198
  - lib/cdmdexer/transformer.rb
197
199
  - lib/cdmdexer/version.rb
198
200
  - travis.yml
@@ -215,7 +217,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
215
217
  - !ruby/object:Gem::Version
216
218
  version: '0'
217
219
  requirements: []
218
- rubygems_version: 3.0.3
220
+ rubygems_version: 3.0.6
219
221
  signing_key:
220
222
  specification_version: 4
221
223
  summary: Load CONTENTdm data into a Solr Index. CDMDEXER expects to run inside a Rails