cdmdexer 0.18.0 → 0.21.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d9864c85d2d86c48decf721e05582718c7edca604b5b60a58221b66abc5632c9
4
- data.tar.gz: d1cae834155be4e7595a8126130b38866744cec68178d8850f25dfb5b4e3d1d7
3
+ metadata.gz: '09844cd37ea7523a05b092aeaaa92383b7871a75ae1892d598aeb16a6ef050b5'
4
+ data.tar.gz: 11384ed2379d8ea5115f1d5d04346261a0309f96342d910f0942b1f4c224123e
5
5
  SHA512:
6
- metadata.gz: 4b869bdeaee31345509c7f1e73d992ba0426f4dbf742f5c89fccb97731aa1d4e73306f0edc0e0b83f5a18eaba55f3c9bb24b2b0c6b2f2fe59bf58c629619e375
7
- data.tar.gz: 8552430d8a82618e9494982a0392a06376db7ac406c3921f3960eac87d8705e3dbcc4836ca313b5b89f353ec67b6e7e9ffecab3ee28ef726d000b1fdc5c0f5af
6
+ metadata.gz: e5213725ac3cf7459e21ccd537c15549beb4b52ff13f64f99af3e96b172b64ab49367777b0685dc9d86ffb86dd0ac91fb3b1ebb7e1ebf77f417b4f4f1c10a467
7
+ data.tar.gz: e544f8e2fbc0528a31550c2946ce2ba2504996ec8894971393acafee40dadf5f3d32f1b5b202ca715fa4bd19e553ddfde7722fab4ad7266dd26502c714894442
data/.env.example ADDED
@@ -0,0 +1,2 @@
1
+ GEONAMES_USER=foo
2
+ GEONAMES_TOKEN=bar
@@ -0,0 +1,39 @@
1
+ # This workflow uses actions that are not certified by GitHub.
2
+ # They are provided by a third-party and are governed by
3
+ # separate terms of service, privacy policy, and support
4
+ # documentation.
5
+ # This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
6
+ # For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby
7
+
8
+ name: Ruby
9
+
10
+ on: [push]
11
+
12
+ jobs:
13
+ test:
14
+
15
+ runs-on: ubuntu-latest
16
+ strategy:
17
+ matrix:
18
+ ruby-version: ['2.6']
19
+
20
+ steps:
21
+ - uses: actions/checkout@v2
22
+ - name: Set up Ruby
23
+ # To automatically get bug fixes and new Ruby versions for ruby/setup-ruby,
24
+ # change this to (see https://github.com/ruby/setup-ruby#versioning):
25
+ # uses: ruby/setup-ruby@v1
26
+ uses: ruby/setup-ruby@473e4d8fe5dd94ee328fdfca9f8c9c7afc9dae5e
27
+ with:
28
+ ruby-version: ${{ matrix.ruby-version }}
29
+ - name: Pin bundler
30
+ run: gem install bundler:1.17.3
31
+ - name: Lock bundler
32
+ run: bundle _1.17.3_ lock
33
+ - name: Run bundler
34
+ run: bundle install
35
+ - name: Run tests
36
+ env:
37
+ GEONAMES_USER: ${{ secrets.GEONAMES_USER }}
38
+ GEONAMES_TOKEN: ${{ secrets.GEONAMES_TOKEN }}
39
+ run: bundle exec rake test
data/.gitignore CHANGED
@@ -8,3 +8,5 @@
8
8
  /spec/reports/
9
9
  /tmp/
10
10
  *.gem
11
+
12
+ .env
data/README.md CHANGED
@@ -30,9 +30,14 @@ require 'cdmdexer/rake_task'
30
30
 
31
31
  In order to make use of the GeoNames service, you must purchase a [GeoNames Premium Webservices Account](http://www.geonames.org/commercial-webservices.html). If you do not have a `geonam` field in your CONTENTdm schema, you may ignore this instruction. Add your credentials to your shell environment once you have secured a GeoNames user:
32
32
 
33
+
33
34
  ```
34
- # e.g. within your .bash_profile or .zprofile file
35
- export export GEONAMES_USER="yourusernamehere"
35
+ cp .env.example .env
36
+ nano .env
37
+
38
+ # Add these vars to the .env file
39
+ GEONAMES_USER=foo
40
+ GEONAMES_TOKEN=bar
36
41
  ```
37
42
 
38
43
  ## Usage
@@ -116,19 +121,51 @@ You might also want to simply override some of the default mappings or add your
116
121
  ```ruby
117
122
  mappings = CDMDEXER::Transformer.default_mappings.merge(your_custom_field_mappings)
118
123
  ```
119
- ## A Custom Post-indexing Callback
120
124
 
121
- If you would like to perform some action (e.g. send an email) following the completion of the CDMDEXER indexing process, you may declare your own callback hook (anything with "Callback" in the class name declared within the CDMDEXER module space will be used). To do so in Rails, create a Rails initializer file `config/initializers/cdmdexer.rb`:
125
+ ### Callbacks
126
+
127
+ CDMDEXER comes with a set of lifecycle hooks that are called at various points during the ETL process. Downstream applications may want to take advantage of these in order to perform logging or notification tasks. Create a Rails initializer at `config/initializers/cdmdexer.rb` in order to take advantage of these hooks.
128
+
129
+ **IMPORTANT NOTE:** Errors (except for http timeouts) are **not raised** but are rather sent to the `CdmError` notification hook below. This prevents sidekiq from piling-up with errors that will never resolve via retries but still allows you to capture the error and be notified of error events.
130
+
131
+ E.g.:
122
132
 
123
133
  ```ruby
124
134
  module CDMDEXER
125
- class Callback
126
- def self.call!
127
- Rails.logger.info("My Custom CDMDEXER Callback")
135
+ class CompletedCallback
136
+ def self.call!(config)
137
+ # e.g. commit records - ::SolrClient.new.commit
138
+ Rails.logger.info "Processing last batch for: #{config['set_spec']}"
139
+ end
140
+ end
141
+
142
+ class OaiNotification
143
+ def self.call!(location)
144
+ Rails.logger.info "CDMDEXER: Requesting: #{location}"
145
+ end
146
+ end
147
+
148
+ class CdmNotification
149
+ def self.call!(collection, id, endpoint)
150
+ Rails.logger.info "CDMDEXER: Requesting: #{collection}:#{id}"
151
+ end
152
+ end
153
+
154
+ class LoaderNotification
155
+ def self.call!(ingestables, deletables)
156
+ Rails.logger.info "CDMDEXER: Loading #{ingestables.length} records and deleting #{deletables.length}"
157
+ end
158
+ end
159
+
160
+ class CdmError
161
+ def self.call!(error)
162
+ Rails.logger.info "CDMDEXER: #{error}"
163
+ # e.g. push error to a slack channel or send an email alert
128
164
  end
129
165
  end
130
166
  end
131
167
  ```
168
+
132
169
  ## Development
133
170
 
134
171
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -146,4 +183,4 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/UMNLib
146
183
  ## TODO
147
184
 
148
185
  * Make StripFormatter the default formatter so it doesn't need to be declared for every field
149
- * Re-brand project: CONTENTdm Indexer. CDMDEXER doesn't necessarily require Blacklight. Moreover only handles indexing.
186
+ * Re-brand project: CONTENTdm Indexer. CDMDEXER doesn't necessarily require Blacklight. Moreover only handles indexing.
data/cdmdexer.gemspec CHANGED
@@ -1,5 +1,6 @@
1
- # coding: utf-8
2
- lib = File.expand_path('../lib', __FILE__)
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path('lib', __dir__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require 'cdmdexer/version'
5
6
 
@@ -9,7 +10,7 @@ Gem::Specification.new do |spec|
9
10
  spec.authors = ['chadfennell']
10
11
  spec.email = ['fenne035@umn.edu']
11
12
 
12
- spec.summary = %q{Load CONTENTdm data into a Solr Index. CDMDEXER expects to run inside a Rails application.}
13
+ spec.summary = 'Load CONTENTdm data into a Solr Index. CDMDEXER expects to run inside a Rails application.'
13
14
  spec.license = 'MIT'
14
15
 
15
16
  spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
@@ -17,19 +18,20 @@ Gem::Specification.new do |spec|
17
18
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
18
19
  spec.require_paths = ['lib']
19
20
 
20
- spec.add_dependency 'hash_at_path', '~> 0.1'
21
21
  spec.add_dependency 'contentdm_api', '~> 0.5.0'
22
+ spec.add_dependency 'hash_at_path', '~> 0.1.6'
23
+ spec.add_dependency 'rsolr', '~> 2.0'
22
24
  spec.add_dependency 'sidekiq', '>= 3.5'
23
25
  spec.add_dependency 'titleize', '~> 1.4'
24
- spec.add_dependency 'rsolr', '~> 2.0'
25
26
  # CDMDEXER expects to run in a rails app, but just to avoid adding
26
27
  # another external dependency for XML procssing, we rely on activesupport's
27
28
  # Has.to_jsonl feature for testing and to allow this gem to function
28
29
  # independently from a rails app
29
- spec.add_dependency 'rails', '>= 5.2'
30
+ spec.add_dependency 'rails', '~> 6.0.0'
30
31
 
32
+ spec.add_development_dependency 'dotenv-rails', '~> 2.7.6'
31
33
  spec.add_development_dependency 'bundler', '~> 1.12'
32
- spec.add_development_dependency 'rake', '~> 12.0'
33
34
  spec.add_development_dependency 'minitest', '~> 5.0'
35
+ spec.add_development_dependency 'rake', '~> 12.0'
34
36
  spec.add_development_dependency 'yard', '~> 0.9.0'
35
37
  end
@@ -0,0 +1,8 @@
1
+ module CDMDEXER
2
+ # An example callback
3
+ class DefaultCdmError
4
+ def self.call!(error)
5
+ puts "CDMDEXER Error: #{error}"
6
+ end
7
+ end
8
+ end
@@ -38,11 +38,20 @@ module CDMDEXER
38
38
  @resumption_token = config.fetch('resumption_token', nil)
39
39
  @batch_size = config.fetch('batch_size', 5).to_i
40
40
  @is_recursive = config.fetch('is_recursive', true)
41
+ after_date = config.fetch('after_date', false)
41
42
 
42
43
  @oai_request = oai_request_klass.new(
43
44
  endpoint_url: oai_endpoint,
44
45
  resumption_token: resumption_token,
45
- set_spec: config.fetch('set_spec', nil)
46
+ set_spec: config.fetch('set_spec', nil),
47
+ # Optionally only select records that have been updated after a
48
+ # certain date. You may need to manually update a parent record
49
+ # after updating a child in order to signify to the indexer that
50
+ # some record in the parent's children has been updated. This indexer
51
+ # expects to only see parent records in the OAI responses.
52
+ # The default here is to skip indexing based on date.
53
+ # Rails example for getting a date: `after_date: 2.weeks.ago`
54
+ after_date: after_date
46
55
  )
47
56
 
48
57
  run_batch!
@@ -35,7 +35,7 @@ module CDMDEXER
35
35
  def transform_field
36
36
  formatter_klass.new(value: field_value, formatters: formatters).format!
37
37
  rescue StandardError => e
38
- raise "Mapping Error:#{field_mapping.config} Error:#{e.message}"
38
+ raise "Mapping: #{field_mapping.config} Error:#{e.message}"
39
39
  end
40
40
  end
41
41
  end
@@ -166,4 +166,4 @@ module CDMDEXER
166
166
  end
167
167
  end
168
168
 
169
- end
169
+ end
@@ -10,6 +10,8 @@ module CDMDEXER
10
10
  hook(pattern: name.to_s, default: DefaultLoaderNotification)
11
11
  elsif name.to_s == 'CdmNotification'
12
12
  hook(pattern: name.to_s, default: DefaultCdmNotification)
13
+ elsif name.to_s == 'CdmError'
14
+ hook(pattern: name.to_s, default: DefaultCdmError)
13
15
  end
14
16
  end
15
17
 
@@ -1,5 +1,8 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'hash_at_path'
2
4
  require 'json'
5
+ require 'time'
3
6
 
4
7
  module CDMDEXER
5
8
  # Light wrapper around OAI requests
@@ -10,16 +13,19 @@ module CDMDEXER
10
13
  attr_reader :endpoint_url,
11
14
  :resumption_token,
12
15
  :client,
13
- :set_spec
16
+ :set_spec,
17
+ :after_date
14
18
 
15
19
  def initialize(endpoint_url: '',
16
20
  resumption_token: nil,
17
21
  set_spec: nil,
18
- client: Net::HTTP)
22
+ client: Net::HTTP,
23
+ after_date: false)
19
24
  @endpoint_url = endpoint_url
20
25
  @resumption_token = resumption_token
21
26
  @client = client
22
27
  @set_spec = set_spec ? "&set=#{set_spec}" : ''
28
+ @after_date = after_date
23
29
  end
24
30
 
25
31
  def records
@@ -44,12 +50,23 @@ module CDMDEXER
44
50
  end
45
51
 
46
52
  def deletable_ids
47
- records.select { |record| record['status'] == 'deleted' }
48
- .map { |record| record[:id] }
53
+ records.select do |record|
54
+ if record['status'] == 'deleted'
55
+ after_date ? Time.parse(record['datestamp']) >= after_date : true
56
+ end
57
+ end.map { |record| record[:id] }
49
58
  end
50
59
 
51
60
  def updatables
52
- records.reject { |record| record['status'] == 'deleted' }
61
+ records.reject do |record|
62
+ if record['status'] == 'deleted'
63
+ true
64
+ elsif after_date && Time.parse(record['datestamp']) < after_date
65
+ true
66
+ else
67
+ false
68
+ end
69
+ end
53
70
  end
54
71
 
55
72
  private
@@ -1,22 +1,31 @@
1
1
  module CDMDEXER
2
+ # "Record Transformation Error: #{message}"
2
3
  class RecordTransformer
3
- attr_reader :record, :field_mappings, :field_transformer
4
+ attr_reader :record, :field_mappings, :field_transformer, :error_klass
4
5
  def initialize(record: {},
5
6
  field_mappings: [],
6
- field_transformer: FieldTransformer)
7
+ field_transformer: FieldTransformer,
8
+ error_klass: TransformationErrorMessage)
7
9
  @record = record
8
10
  @field_mappings = field_mappings
9
11
  @field_transformer = field_transformer
12
+ @error_klass = error_klass
10
13
  end
11
14
 
12
15
  def transform!
13
16
  field_mappings.inject({}) do |dest_record, field_mapping|
14
17
  dest_record.merge(transform_field(record, field_mapping))
15
18
  end
19
+ rescue StandardError => error
20
+ error_klass.new(message: message(error)).notify
16
21
  end
17
22
 
18
23
  private
19
24
 
25
+ def message(error)
26
+ "Record Transformation Error (Record #{record['id']}): #{error}"
27
+ end
28
+
20
29
  def transform_field(record, field_mapping)
21
30
  field_transformer.new(field_mapping: field_mapping,
22
31
  record: record).reduce
@@ -0,0 +1,23 @@
1
+ module CDMDEXER
2
+ # Raise anything but timeout errors or other http connection errors
3
+ # Notify downstream in case users want to log the non-timeout errors
4
+ class TransformationErrorMessage
5
+ attr_reader :message, :notification_klass
6
+ def initialize(message: :MISSING_ERROR_MESSAGE,
7
+ notification_klass: CDMDEXER::CdmError)
8
+ @notification_klass = notification_klass
9
+ @message = message
10
+ end
11
+
12
+ def notify
13
+ notification_klass.call! message
14
+ raise message if http_error?
15
+ end
16
+
17
+ private
18
+
19
+ def http_error?
20
+ !(message =~ /ConnectionError/).nil?
21
+ end
22
+ end
23
+ end
@@ -1,3 +1,3 @@
1
1
  module CDMDEXER
2
- VERSION = "0.18.0"
3
- end
2
+ VERSION = "0.21.1"
3
+ end
data/lib/cdmdexer.rb CHANGED
@@ -24,3 +24,5 @@ require 'cdmdexer/etl_by_set_specs'
24
24
  require 'cdmdexer/regex_filter_callback'
25
25
  require 'cdmdexer/field_mapping'
26
26
  require 'cdmdexer/cdm_item'
27
+ require 'cdmdexer/default_cdm_error'
28
+ require 'cdmdexer/transformation_error_message'
metadata CHANGED
@@ -1,43 +1,57 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cdmdexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.18.0
4
+ version: 0.21.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - chadfennell
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2019-03-18 00:00:00.000000000 Z
11
+ date: 2021-12-20 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: contentdm_api
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 0.5.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 0.5.0
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: hash_at_path
15
29
  requirement: !ruby/object:Gem::Requirement
16
30
  requirements:
17
31
  - - "~>"
18
32
  - !ruby/object:Gem::Version
19
- version: '0.1'
33
+ version: 0.1.6
20
34
  type: :runtime
21
35
  prerelease: false
22
36
  version_requirements: !ruby/object:Gem::Requirement
23
37
  requirements:
24
38
  - - "~>"
25
39
  - !ruby/object:Gem::Version
26
- version: '0.1'
40
+ version: 0.1.6
27
41
  - !ruby/object:Gem::Dependency
28
- name: contentdm_api
42
+ name: rsolr
29
43
  requirement: !ruby/object:Gem::Requirement
30
44
  requirements:
31
45
  - - "~>"
32
46
  - !ruby/object:Gem::Version
33
- version: 0.5.0
47
+ version: '2.0'
34
48
  type: :runtime
35
49
  prerelease: false
36
50
  version_requirements: !ruby/object:Gem::Requirement
37
51
  requirements:
38
52
  - - "~>"
39
53
  - !ruby/object:Gem::Version
40
- version: 0.5.0
54
+ version: '2.0'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: sidekiq
43
57
  requirement: !ruby/object:Gem::Requirement
@@ -67,33 +81,33 @@ dependencies:
67
81
  - !ruby/object:Gem::Version
68
82
  version: '1.4'
69
83
  - !ruby/object:Gem::Dependency
70
- name: rsolr
84
+ name: rails
71
85
  requirement: !ruby/object:Gem::Requirement
72
86
  requirements:
73
87
  - - "~>"
74
88
  - !ruby/object:Gem::Version
75
- version: '2.0'
89
+ version: 6.0.0
76
90
  type: :runtime
77
91
  prerelease: false
78
92
  version_requirements: !ruby/object:Gem::Requirement
79
93
  requirements:
80
94
  - - "~>"
81
95
  - !ruby/object:Gem::Version
82
- version: '2.0'
96
+ version: 6.0.0
83
97
  - !ruby/object:Gem::Dependency
84
- name: rails
98
+ name: dotenv-rails
85
99
  requirement: !ruby/object:Gem::Requirement
86
100
  requirements:
87
- - - ">="
101
+ - - "~>"
88
102
  - !ruby/object:Gem::Version
89
- version: '5.2'
90
- type: :runtime
103
+ version: 2.7.6
104
+ type: :development
91
105
  prerelease: false
92
106
  version_requirements: !ruby/object:Gem::Requirement
93
107
  requirements:
94
- - - ">="
108
+ - - "~>"
95
109
  - !ruby/object:Gem::Version
96
- version: '5.2'
110
+ version: 2.7.6
97
111
  - !ruby/object:Gem::Dependency
98
112
  name: bundler
99
113
  requirement: !ruby/object:Gem::Requirement
@@ -109,33 +123,33 @@ dependencies:
109
123
  - !ruby/object:Gem::Version
110
124
  version: '1.12'
111
125
  - !ruby/object:Gem::Dependency
112
- name: rake
126
+ name: minitest
113
127
  requirement: !ruby/object:Gem::Requirement
114
128
  requirements:
115
129
  - - "~>"
116
130
  - !ruby/object:Gem::Version
117
- version: '12.0'
131
+ version: '5.0'
118
132
  type: :development
119
133
  prerelease: false
120
134
  version_requirements: !ruby/object:Gem::Requirement
121
135
  requirements:
122
136
  - - "~>"
123
137
  - !ruby/object:Gem::Version
124
- version: '12.0'
138
+ version: '5.0'
125
139
  - !ruby/object:Gem::Dependency
126
- name: minitest
140
+ name: rake
127
141
  requirement: !ruby/object:Gem::Requirement
128
142
  requirements:
129
143
  - - "~>"
130
144
  - !ruby/object:Gem::Version
131
- version: '5.0'
145
+ version: '12.0'
132
146
  type: :development
133
147
  prerelease: false
134
148
  version_requirements: !ruby/object:Gem::Requirement
135
149
  requirements:
136
150
  - - "~>"
137
151
  - !ruby/object:Gem::Version
138
- version: '5.0'
152
+ version: '12.0'
139
153
  - !ruby/object:Gem::Dependency
140
154
  name: yard
141
155
  requirement: !ruby/object:Gem::Requirement
@@ -157,6 +171,8 @@ executables: []
157
171
  extensions: []
158
172
  extra_rdoc_files: []
159
173
  files:
174
+ - ".env.example"
175
+ - ".github/workflows/ruby.yml"
160
176
  - ".gitignore"
161
177
  - ".rubocop.yml"
162
178
  - ".travis.yml"
@@ -170,6 +186,7 @@ files:
170
186
  - cdmdexer.gemspec
171
187
  - lib/cdmdexer.rb
172
188
  - lib/cdmdexer/cdm_item.rb
189
+ - lib/cdmdexer/default_cdm_error.rb
173
190
  - lib/cdmdexer/default_cdm_notification.rb
174
191
  - lib/cdmdexer/default_completed_callback.rb
175
192
  - lib/cdmdexer/default_loader_notification.rb
@@ -193,6 +210,7 @@ files:
193
210
  - lib/cdmdexer/tasks/delete.rake
194
211
  - lib/cdmdexer/tasks/etl.rake
195
212
  - lib/cdmdexer/transform_worker.rb
213
+ - lib/cdmdexer/transformation_error_message.rb
196
214
  - lib/cdmdexer/transformer.rb
197
215
  - lib/cdmdexer/version.rb
198
216
  - travis.yml