search_solr_tools 6.4.0 → 6.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +12 -0
- data/README.md +97 -28
- data/bin/search_solr_tools +18 -5
- data/lib/search_solr_tools/config/environments.rb +2 -2
- data/lib/search_solr_tools/config/environments.yaml +17 -2
- data/lib/search_solr_tools/errors/harvest_error.rb +6 -2
- data/lib/search_solr_tools/harvesters/auto_suggest.rb +2 -2
- data/lib/search_solr_tools/harvesters/base.rb +17 -15
- data/lib/search_solr_tools/harvesters/nsidc_auto_suggest.rb +1 -1
- data/lib/search_solr_tools/harvesters/nsidc_json.rb +5 -5
- data/lib/search_solr_tools/logging/sst_logger.rb +71 -0
- data/lib/search_solr_tools/version.rb +1 -1
- data/lib/search_solr_tools.rb +2 -0
- data/search_solr_tools.gemspec +3 -1
- metadata +34 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: daf440444527ee68761274399910b02c51d7994a1b605c57f64b2666d7a243c7
|
4
|
+
data.tar.gz: 13a1a0ccd9d623aad334f68c6d314f7205a48a45546393d3d2ed4d71a95899cc
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2b195f73dc88f1fcf83b354f7ea572082419172970ced5e15e7f7afdb89f47da85424b473e9aa87462483f3ba6c5a2e1e2eb33c1493d72a978ff6570d86d9645
|
7
|
+
data.tar.gz: 96b72cadd2046588046b56a9911ee14caefead8813c38d579b93d9df56b00afacfeae13033c3ed5b866076c81aa694ccdf77fe024c354191c52d65db04c7f186
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,15 @@
|
|
1
|
+
## v6.5.0 (2023-09-21)
|
2
|
+
|
3
|
+
- Adding logging functionality to the code, including the ability
|
4
|
+
to specify log file destination and log level for both the file and
|
5
|
+
console output
|
6
|
+
|
7
|
+
## v6.4.1 (2023-09-15)
|
8
|
+
|
9
|
+
- Added GitHub Action workflows for continuous integration features
|
10
|
+
- Updated bump rake task to use Bump gem
|
11
|
+
- Removed release rake task, moved it to the CI workflow
|
12
|
+
|
1
13
|
## v6.4.0 (2023-08-14)
|
2
14
|
|
3
15
|
- Fixed a bug with the sanitization, which was trying to modify the
|
data/README.md
CHANGED
@@ -101,51 +101,79 @@ the tests whenever the appropriate files are changed.
|
|
101
101
|
|
102
102
|
Please be sure to run them in the `bundle exec` context if you're utilizing bundler.
|
103
103
|
|
104
|
+
By default, tests are run with minimal logging - no log file and only fatal errors
|
105
|
+
written to the console. This can be changed by setting the environment variables
|
106
|
+
as described in [Logging](#logging) below.
|
107
|
+
|
104
108
|
### Creating Releases (NSIDC devs only)
|
105
109
|
|
106
110
|
Requirements:
|
107
111
|
|
108
112
|
* Ruby > 3.2.2
|
109
113
|
* [Bundler](http://bundler.io/)
|
110
|
-
* [Gem Release](https://github.com/svenfuchs/gem-release)
|
111
114
|
* [Rake](https://github.com/ruby/rake)
|
112
|
-
* A [RubyGems](https://rubygems.org) account that has
|
113
|
-
[ownership](http://guides.rubygems.org/publishing/) of the gem
|
114
115
|
* RuboCop and the unit tests should all pass (`rake`)
|
115
116
|
|
116
|
-
|
117
|
-
`rake release:*` tasks. Update it manually to insert the correct version and
|
118
|
-
date, and commit the file, before creating the release package.
|
117
|
+
To make a release, follow these steps:
|
119
118
|
|
120
|
-
|
121
|
-
|
119
|
+
1. Confirm no errors are returned by `bundle exec rubocop` *
|
120
|
+
2. Confirm all tests pass (`bundle exec rake spec:unit`) *
|
121
|
+
3. Ensure that the `CHANGELOG.md` file is up to date with an `Unreleased`
|
122
|
+
header.
|
123
|
+
4. Submit a Pull Request
|
124
|
+
5. Once the PR has been reviewed and approved, merge the branch into `main`
|
125
|
+
6. On your local machine, ensure you are on the `main` branch (and have
|
126
|
+
it up-to-date), and run `bundle exec rake bump:<part>` (see below)
|
127
|
+
* This will trigger the GitHub Actions CI workflow to push a release to
|
128
|
+
RubyGems.
|
122
129
|
|
123
|
-
|
124
|
-
|
125
|
-
| `rake release:pre[false]` | Increase the current prerelease version number, push changes |
|
126
|
-
| `rake release:pre[true]` | Increase the current prerelease version number, publish release\* |
|
127
|
-
| `rake release:none` | Drop the prerelease version, publish release\*, then `pre[false]` (does a patch release) |
|
128
|
-
| `rake release:minor` | Increase the minor version number, publish release\*, then `pre[false]` |
|
129
|
-
| `rake release:major` | Increase the major version number, publish release\*, then `pre[false]` |
|
130
|
+
The steps marked `*` above don't need to be done manually; every time a commit
|
131
|
+
is pushed to the GitHub repository, these tests will be run automatically.
|
130
132
|
|
131
|
-
|
133
|
+
The first 4 steps above are self-explanatory. More information on the last
|
134
|
+
steps can be found below.
|
132
135
|
|
133
|
-
|
134
|
-
* the changes are pushed
|
135
|
-
* the tagged version is built and published to RubyGems
|
136
|
+
#### Version Bumping
|
136
137
|
|
137
|
-
|
138
|
-
order to publish a new version of the gem to Rubygems. To get the lastest API key:
|
138
|
+
Running the `bundle exec rake bump:<part>` tasks will do the following actions:
|
139
139
|
|
140
|
-
|
140
|
+
1. The gem version will be updated locally
|
141
|
+
2. The `CHANGELOG.md` file will updated with the updated gem version and date
|
142
|
+
3. A tag `vx.x.x` will be created (with the new gem version)
|
143
|
+
4. The files updated by the bump will be pushed to the GitHub repository, along
|
144
|
+
with the newly created tag.
|
145
|
+
|
146
|
+
The sub-tasks associated with bump will allow the type of bump to be determined:
|
147
|
+
|
148
|
+
| Command | Description |
|
149
|
+
|---------------------------|----------------------------------------------------------------------------------------------------|
|
150
|
+
| `rake bump:pre` | Increase the current prerelease version number (v1.2.3 -> v1.2.3.pre1; v1.2.3.pre1 -> v1.2.3.pre2) |
|
151
|
+
| `rake bump:patch` | Increase the current patch number (v1.2.0 -> v1.2.1; v1.2.4 -> v1.2.4) |
|
152
|
+
| `rake bump:minor` | Increase the minor version number (v1.2.0 -> v1.3.0; v1.2.4 -> v1.3.0) |
|
153
|
+
| `rake bump:major` | Increase the major version number (v1.2.0 _> v2.0.0; v1.2.4 -> v2.0.0) |
|
154
|
+
|
155
|
+
Using any bump other than `pre` will remove the `pre` suffix from the version as well.
|
156
|
+
|
157
|
+
#### Release to RubyGems
|
141
158
|
|
142
|
-
|
159
|
+
When a tag in the format of `vx.y.z` (including a `pre` suffix) is pushed to GitHub,
|
160
|
+
it will trigger the GitHub Actions release workflow. This workflow will:
|
143
161
|
|
144
|
-
|
145
|
-
|
146
|
-
|
147
|
-
|
148
|
-
|
162
|
+
1. Build the gem
|
163
|
+
2. Push the gem to RubyGems
|
164
|
+
|
165
|
+
The CI workflow has the credentials set up to push to RubyGems, so no user intervention
|
166
|
+
is needed, and the workflow itself does not have to be manually triggered.
|
167
|
+
|
168
|
+
If needed, the release can also be done locally by running the command
|
169
|
+
`bundle exec gem release`. In order for this to work, you will need to have a
|
170
|
+
local copy of current Rubygems API key for the _NSIDC developer user_ account in
|
171
|
+
To get the lastest API key:
|
172
|
+
|
173
|
+
`curl -u <username> https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials; chmod 0600 ~/.gem/credentials`
|
174
|
+
|
175
|
+
It is recommended that this not be run locally, however; use the GitHub Actions CI
|
176
|
+
workflow instead.
|
149
177
|
|
150
178
|
### SOLR
|
151
179
|
|
@@ -184,6 +212,47 @@ which can be modified, or additional environments can be added by just adding a
|
|
184
212
|
new YAML stanza with the right keys; this new environment can then be used with
|
185
213
|
the `--environment` flag when running `search_solr_tools harvest`.
|
186
214
|
|
215
|
+
#### Logging
|
216
|
+
|
217
|
+
By default, when running the harvest, harvest logs are written to the file
|
218
|
+
`/var/log/search-solr-tools.log` (set to `warn` level), as well as to the console
|
219
|
+
at `info` level. These settings are configured in the `environments.yaml` config
|
220
|
+
file, in the `common` section.
|
221
|
+
|
222
|
+
The keys in the `environments.yaml` file to consider are as follows:
|
223
|
+
|
224
|
+
* `log_file` - The full name and path of the file to which log output will be written
|
225
|
+
to. If set to the special value `none`, no log file will be written to at all.
|
226
|
+
Log output will be **appended** to the file, if it exists; otherwise, the file will
|
227
|
+
be created.
|
228
|
+
* `log_file_level` - Indicates the level of logging which should be written to the log file.
|
229
|
+
* `log_stdout_level` - Indicates the level of logging which should be written to the console.
|
230
|
+
This can be different than the level written to the log file.
|
231
|
+
|
232
|
+
You can also override the configuration file settings at the command line with the
|
233
|
+
following environment variables (useful when for doing development work):
|
234
|
+
|
235
|
+
* `SEARCH_SOLR_LOG_FILE` - Overrides the `log_file` setting
|
236
|
+
* `SEARCH_SOLR_LOG_LEVEL` - Overrides the `log_file_level` setting
|
237
|
+
* `SEARCH_SOLR_STDOUT_LEVEL` - Overrides the `log_stdout_level` setting
|
238
|
+
|
239
|
+
When running the spec tests, `SEARCH_SOLR_LOG_FILE` is set to `none` and
|
240
|
+
`SEARCH_SOLR_STDOUT_LEVEL` is set to `fatal`, unless you manually set those
|
241
|
+
environment variables prior to running the tests. This is to keep the test output
|
242
|
+
clean unless you need more detail for debugging.
|
243
|
+
|
244
|
+
The following are the levels of logging that can be specified. These levels are
|
245
|
+
cumulative; for example, `error` will also output `fatal` log entries, and `debug`
|
246
|
+
will output **all** log entries.
|
247
|
+
|
248
|
+
* `none` - No logging outputs will be written.
|
249
|
+
* `fatal` - Only outputs errors which result in a crash.
|
250
|
+
* `error` - Outputs any error that occurs while harvesting.
|
251
|
+
* `warn` - Outputs warnings that occur that do not cause issues with the harvesting,
|
252
|
+
but might indicate things that may need to be addressed (such as deprecations, etc)
|
253
|
+
* `info` - Outputs general information, such as harvesting status
|
254
|
+
* `debug` - Outputs detailed information that can be used for debugging and code tracing.
|
255
|
+
|
187
256
|
## Organization Info
|
188
257
|
|
189
258
|
### How to contact NSIDC
|
data/bin/search_solr_tools
CHANGED
@@ -6,8 +6,14 @@ require 'thor'
|
|
6
6
|
|
7
7
|
# rubocop:disable Metrics/AbcSize
|
8
8
|
class SolrHarvestCLI < Thor
|
9
|
+
include SSTLogger
|
10
|
+
|
9
11
|
map %w[--version -v] => :__print_version
|
10
12
|
|
13
|
+
def self.exit_on_failure?
|
14
|
+
false
|
15
|
+
end
|
16
|
+
|
11
17
|
desc '--version, -v', 'print the version'
|
12
18
|
def __print_version
|
13
19
|
puts SearchSolrTools::VERSION
|
@@ -39,10 +45,11 @@ class SolrHarvestCLI < Thor
|
|
39
45
|
rescue StandardError => e
|
40
46
|
solr_status = false
|
41
47
|
source_status = false
|
42
|
-
|
48
|
+
logger.error "Ping failed for #{target}: #{e}}"
|
43
49
|
end
|
44
50
|
solr_success &&= solr_status
|
45
51
|
source_success &&= source_status
|
52
|
+
|
46
53
|
puts "Target: #{target}, Solr ping OK? #{solr_status}, data center ping OK? #{source_status}"
|
47
54
|
end
|
48
55
|
|
@@ -61,7 +68,7 @@ class SolrHarvestCLI < Thor
|
|
61
68
|
option :die_on_failure, type: :boolean
|
62
69
|
def harvest(die_on_failure = options[:die_on_failure] || false)
|
63
70
|
options[:data_center].each do |target|
|
64
|
-
|
71
|
+
logger.info "Target: #{target}"
|
65
72
|
begin
|
66
73
|
harvest_class = get_harvester_class(target)
|
67
74
|
harvester = harvest_class.new(options[:environment], die_on_failure:)
|
@@ -73,12 +80,12 @@ class SolrHarvestCLI < Thor
|
|
73
80
|
|
74
81
|
harvester.harvest_and_delete
|
75
82
|
rescue SearchSolrTools::Errors::HarvestError => e
|
76
|
-
|
83
|
+
logger.error "THERE WERE HARVEST STATUS ERRORS:\n#{e.message}"
|
77
84
|
exit e.exit_code
|
78
85
|
rescue StandardError => e
|
79
86
|
# If it gets here, there is an error that we aren't expecting.
|
80
|
-
|
81
|
-
|
87
|
+
logger.error "harvest failed for #{target}: #{e.message}"
|
88
|
+
logger.error e.backtrace
|
82
89
|
exit SearchSolrTools::Errors::HarvestError::ERRCODE_OTHER
|
83
90
|
end
|
84
91
|
end
|
@@ -93,16 +100,20 @@ class SolrHarvestCLI < Thor
|
|
93
100
|
option :environment, required: true
|
94
101
|
def delete_all
|
95
102
|
env = SearchSolrTools::SolrEnvironments[options[:environment]]
|
103
|
+
logger.info('DELETE ALL started')
|
96
104
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<delete><query>*:*</query></delete>'`
|
97
105
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<commit/>'`
|
106
|
+
logger.info('DELETE ALL complete')
|
98
107
|
end
|
99
108
|
|
100
109
|
desc 'delete_all_auto_suggest', 'Delete all documents from the auto_suggest index'
|
101
110
|
option :environment, required: true
|
102
111
|
def delete_all_auto_suggest
|
103
112
|
env = SearchSolrTools::SolrEnvironments[options[:environment]]
|
113
|
+
logger.info('DELETE ALL AUTO_SUGGEST started')
|
104
114
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<delete><query>*:*</query></delete>'`
|
105
115
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<commit/>'`
|
116
|
+
logger.info('DELETE ALL AUTO_SUGGEST complete')
|
106
117
|
end
|
107
118
|
|
108
119
|
desc 'delete_by_data_center', 'Force deletion of documents for a specific data center with timestamps before the passed timestamp in format iso8601 (2014-07-14T21:49:21Z)'
|
@@ -110,11 +121,13 @@ class SolrHarvestCLI < Thor
|
|
110
121
|
option :environment, required: true
|
111
122
|
option :data_center, required: true
|
112
123
|
def delete_by_data_center
|
124
|
+
logger.info("DELETE ALL for data center '#{options[:data_center]}' started")
|
113
125
|
harvester = get_harvester_class(options[:data_center]).new options[:environment]
|
114
126
|
harvester.delete_old_documents(options[:timestamp],
|
115
127
|
"data_centers:\"#{SearchSolrTools::Helpers::SolrFormat::DATA_CENTER_NAMES[options[:data_center].upcase.to_sym][:long_name]}\"",
|
116
128
|
SearchSolrTools::SolrEnvironments[harvester.environment][:collection_name],
|
117
129
|
true)
|
130
|
+
logger.info("DELETE ALL for data center '#{options[:data_center]}' complete")
|
118
131
|
end
|
119
132
|
|
120
133
|
no_tasks do
|
@@ -5,10 +5,10 @@ require 'yaml'
|
|
5
5
|
module SearchSolrTools
|
6
6
|
# configuration to work with solr locally, or on integration/qa/staging/prod
|
7
7
|
module SolrEnvironments
|
8
|
-
YAML_ENVS = YAML.load_file(File.expand_path('environments.yaml', __dir__))
|
8
|
+
YAML_ENVS = YAML.load_file(File.expand_path('environments.yaml', __dir__), aliases: true)
|
9
9
|
|
10
10
|
def self.[](env = :development)
|
11
|
-
YAML_ENVS[
|
11
|
+
YAML_ENVS[env.to_sym]
|
12
12
|
end
|
13
13
|
end
|
14
14
|
end
|
@@ -1,4 +1,4 @@
|
|
1
|
-
:common:
|
1
|
+
:common: &common
|
2
2
|
:auto_suggest_collection_name: auto_suggest
|
3
3
|
:collection_name: nsidc_oai
|
4
4
|
:collection_path: solr
|
@@ -9,33 +9,48 @@
|
|
9
9
|
# should be. GLA01.018 will show up if we use DCS API v2.
|
10
10
|
:nsidc_oai_identifiers_url: oai?verb=ListIdentifiers&metadataPrefix=dif&retired=false
|
11
11
|
|
12
|
+
# Log details. Can be overridden by environment-specific values
|
13
|
+
:log_file: /var/log/search-solr-tools.log
|
14
|
+
:log_file_level: warn
|
15
|
+
:log_stdout_level: info
|
16
|
+
|
12
17
|
:local:
|
18
|
+
<<: *common
|
13
19
|
:host: localhost
|
14
20
|
:nsidc_dataset_metadata_url: http://integration.nsidc.org/api/dataset/metadata/
|
15
21
|
|
16
|
-
:dev:
|
22
|
+
:dev: &dev
|
23
|
+
<<: *common
|
17
24
|
## For the below, you'll need to instantiate your own search-solr instance, and point host to that.
|
18
25
|
:host: dev.search-solr.USERNAME.dev.int.nsidc.org
|
19
26
|
## For the metadata content, either set up your own instance of dataset-catalog-services
|
20
27
|
## or change the URL below to point to integration
|
21
28
|
:nsidc_dataset_metadata_url: http://dev.dcs.USERNAME.dev.int.nsidc.org:1580/api/dataset/metadata/
|
22
29
|
|
30
|
+
:development:
|
31
|
+
<<: *dev
|
32
|
+
|
23
33
|
:integration:
|
34
|
+
<<: *common
|
24
35
|
:host: integration.search-solr.apps.int.nsidc.org
|
25
36
|
:nsidc_dataset_metadata_url: http://integration.nsidc.org/api/dataset/metadata/
|
26
37
|
|
27
38
|
:qa:
|
39
|
+
<<: *common
|
28
40
|
:host: qa.search-solr.apps.int.nsidc.org
|
29
41
|
:nsidc_dataset_metadata_url: http://qa.nsidc.org/api/dataset/metadata/
|
30
42
|
|
31
43
|
:staging:
|
44
|
+
<<: *common
|
32
45
|
:host: staging.search-solr.apps.int.nsidc.org
|
33
46
|
:nsidc_dataset_metadata_url: http://staging.nsidc.org/api/dataset/metadata/
|
34
47
|
|
35
48
|
:blue:
|
49
|
+
<<: *common
|
36
50
|
:host: blue.search-solr.apps.int.nsidc.org
|
37
51
|
:nsidc_dataset_metadata_url: http://nsidc.org/api/dataset/metadata/
|
38
52
|
|
39
53
|
:production:
|
54
|
+
<<: *common
|
40
55
|
:host: search-solr.apps.int.nsidc.org
|
41
56
|
:nsidc_dataset_metadata_url: http://nsidc.org/api/dataset/metadata/
|
@@ -1,8 +1,12 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
+
require_relative '../logging/sst_logger'
|
4
|
+
|
3
5
|
module SearchSolrTools
|
4
6
|
module Errors
|
5
7
|
class HarvestError < StandardError
|
8
|
+
include SSTLogger
|
9
|
+
|
6
10
|
ERRCODE_SOLR_PING = 1
|
7
11
|
ERRCODE_SOURCE_PING = 2
|
8
12
|
ERRCODE_SOURCE_NO_RESULTS = 4
|
@@ -73,11 +77,11 @@ module SearchSolrTools
|
|
73
77
|
# rubocop:disable Metrics/AbcSize
|
74
78
|
def exit_code
|
75
79
|
if @status_data.nil?
|
76
|
-
|
80
|
+
logger.error "OTHER ERROR REPORTED: #{@other_message}"
|
77
81
|
return ERRCODE_OTHER
|
78
82
|
end
|
79
83
|
|
80
|
-
|
84
|
+
logger.error "EXIT CODE STATUS:\n#{@status_data.status}"
|
81
85
|
|
82
86
|
code = 0
|
83
87
|
code += ERRCODE_SOLR_PING unless @status_data.ping_solr
|
@@ -51,10 +51,10 @@ module SearchSolrTools
|
|
51
51
|
status = insert_solr_doc add_docs, Base::JSON_CONTENT_TYPE, @env_settings[:auto_suggest_collection_name]
|
52
52
|
|
53
53
|
if status == Helpers::HarvestStatus::INGEST_OK
|
54
|
-
|
54
|
+
logger.info "Added #{add_docs.size} auto suggest documents in one commit"
|
55
55
|
Helpers::HarvestStatus.new(Helpers::HarvestStatus::INGEST_OK => add_docs)
|
56
56
|
else
|
57
|
-
|
57
|
+
logger.error "Failed adding #{add_docs.size} documents in single commit, retrying one by one"
|
58
58
|
new_add_docs = []
|
59
59
|
add_docs.each do |doc|
|
60
60
|
new_add_docs << { 'add' => { 'doc' => doc } }
|
@@ -15,6 +15,8 @@ module SearchSolrTools
|
|
15
15
|
module Harvesters
|
16
16
|
# base class for solr harvesters
|
17
17
|
class Base
|
18
|
+
include SSTLogger
|
19
|
+
|
18
20
|
attr_accessor :environment
|
19
21
|
|
20
22
|
DELETE_DOCUMENTS_RATIO = 0.1
|
@@ -50,10 +52,10 @@ module SearchSolrTools
|
|
50
52
|
begin
|
51
53
|
RestClient.get(url) do |response, _request, _result|
|
52
54
|
success = response.code == 200
|
53
|
-
|
55
|
+
logger.error "Error in ping request: #{response.body}" unless success
|
54
56
|
end
|
55
57
|
rescue StandardError => e
|
56
|
-
|
58
|
+
logger.error "Rest exception while pinging Solr: #{e}"
|
57
59
|
end
|
58
60
|
success
|
59
61
|
end
|
@@ -62,7 +64,7 @@ module SearchSolrTools
|
|
62
64
|
# to "ping" the data center. Returns true if the ping is successful (or, as
|
63
65
|
# in this default, no ping method was defined)
|
64
66
|
def ping_source
|
65
|
-
|
67
|
+
logger.info 'Harvester does not have ping method defined, assuming true'
|
66
68
|
true
|
67
69
|
end
|
68
70
|
|
@@ -81,9 +83,9 @@ module SearchSolrTools
|
|
81
83
|
solr = RSolr.connect url: solr_url + "/#{solr_core}"
|
82
84
|
unchanged_count = (solr.get 'select', params: { wt: :ruby, q: delete_query, rows: 0 })['response']['numFound'].to_i
|
83
85
|
if unchanged_count.zero?
|
84
|
-
|
86
|
+
logger.info "All documents were updated after #{timestamp}, nothing to delete"
|
85
87
|
else
|
86
|
-
|
88
|
+
logger.info "Begin removing documents older than #{timestamp}"
|
87
89
|
remove_documents(solr, delete_query, constraints, force, unchanged_count)
|
88
90
|
end
|
89
91
|
end
|
@@ -99,13 +101,13 @@ module SearchSolrTools
|
|
99
101
|
def remove_documents(solr, delete_query, constraints, force, numfound)
|
100
102
|
all_response_count = (solr.get 'select', params: { wt: :ruby, q: constraints, rows: 0 })['response']['numFound']
|
101
103
|
if force || (numfound / all_response_count.to_f < DELETE_DOCUMENTS_RATIO)
|
102
|
-
|
104
|
+
logger.info "Deleting #{numfound} documents for #{constraints}"
|
103
105
|
solr.delete_by_query delete_query
|
104
106
|
solr.commit
|
105
107
|
else
|
106
|
-
|
107
|
-
|
108
|
-
|
108
|
+
logger.info "Failed to delete records older than current harvest start because they exceeded #{DELETE_DOCUMENTS_RATIO} of the total records for this data center."
|
109
|
+
logger.info "\tTotal records: #{all_response_count}"
|
110
|
+
logger.info "\tNon-updated records: #{numfound}"
|
109
111
|
end
|
110
112
|
end
|
111
113
|
|
@@ -121,8 +123,8 @@ module SearchSolrTools
|
|
121
123
|
status.record_status doc_status
|
122
124
|
doc_status == Helpers::HarvestStatus::INGEST_OK ? success += 1 : failure += 1
|
123
125
|
end
|
124
|
-
|
125
|
-
|
126
|
+
logger.info "#{success} document#{success == 1 ? '' : 's'} successfully added to Solr."
|
127
|
+
logger.info "#{failure} document#{failure == 1 ? '' : 's'} not added to Solr."
|
126
128
|
|
127
129
|
status
|
128
130
|
end
|
@@ -146,14 +148,14 @@ module SearchSolrTools
|
|
146
148
|
RestClient.post(url, doc_serialized, content_type:) do |response, _request, _result|
|
147
149
|
success = response.code == 200
|
148
150
|
unless success
|
149
|
-
|
151
|
+
logger.error "Error for #{doc_serialized}\n\n response: #{response.body}"
|
150
152
|
status = Helpers::HarvestStatus::INGEST_ERR_SOLR_ERROR
|
151
153
|
end
|
152
154
|
end
|
153
155
|
rescue StandardError => e
|
154
156
|
# TODO: Need to provide more detail re: this failure so we know whether to
|
155
157
|
# exit the job with a status != 0
|
156
|
-
|
158
|
+
logger.error "Rest exception while POSTing to Solr: #{e}, for doc: #{doc_serialized}"
|
157
159
|
status = Helpers::HarvestStatus::INGEST_ERR_SOLR_ERROR
|
158
160
|
end
|
159
161
|
status
|
@@ -177,11 +179,11 @@ module SearchSolrTools
|
|
177
179
|
request_url = encode_data_provider_url(request_url)
|
178
180
|
|
179
181
|
begin
|
180
|
-
|
182
|
+
logger.debug "Request: #{request_url}"
|
181
183
|
response = URI.parse(request_url).open(read_timeout: timeout, 'Content-Type' => content_type)
|
182
184
|
rescue OpenURI::HTTPError, Timeout::Error, Errno::ETIMEDOUT => e
|
183
185
|
retries_left -= 1
|
184
|
-
|
186
|
+
logger.error "## REQUEST FAILED ## #{e.class} ## Retrying #{retries_left} more times..."
|
185
187
|
|
186
188
|
retry if retries_left.positive?
|
187
189
|
|
@@ -6,7 +6,7 @@ module SearchSolrTools
|
|
6
6
|
module Harvesters
|
7
7
|
class NsidcAutoSuggest < AutoSuggest
|
8
8
|
def harvest_and_delete
|
9
|
-
|
9
|
+
logger.info 'Building auto-suggest indexes for NSIDC'
|
10
10
|
super(method(:harvest), 'source:"NSIDC"', @env_settings[:auto_suggest_collection_name])
|
11
11
|
end
|
12
12
|
|
@@ -21,13 +21,13 @@ module SearchSolrTools
|
|
21
21
|
return response.code == 200
|
22
22
|
end
|
23
23
|
rescue StandardError
|
24
|
-
|
24
|
+
logger.error "Error trying to get options for #{nsidc_json_url} (ping)"
|
25
25
|
end
|
26
26
|
false
|
27
27
|
end
|
28
28
|
|
29
29
|
def harvest_and_delete
|
30
|
-
|
30
|
+
logger.info "Running harvest of NSIDC catalog from #{nsidc_json_url}"
|
31
31
|
super(method(:harvest_nsidc_json_into_solr), "data_centers:\"#{Helpers::SolrFormat::DATA_CENTER_NAMES[:NSIDC][:long_name]}\"")
|
32
32
|
end
|
33
33
|
|
@@ -47,8 +47,8 @@ module SearchSolrTools
|
|
47
47
|
rescue Errors::HarvestError => e
|
48
48
|
raise e
|
49
49
|
rescue StandardError => e
|
50
|
-
|
51
|
-
|
50
|
+
logger.error "An unexpected exception occurred while trying to harvest or insert: #{e}"
|
51
|
+
logger.error e.backtrace
|
52
52
|
status = Helpers::HarvestStatus.new(Helpers::HarvestStatus::OTHER_ERROR => e)
|
53
53
|
raise Errors::HarvestError, status
|
54
54
|
end
|
@@ -83,7 +83,7 @@ module SearchSolrTools
|
|
83
83
|
begin
|
84
84
|
docs << { 'add' => { 'doc' => @translator.translate(fetch_json_from_nsidc(id)) } }
|
85
85
|
rescue StandardError => e
|
86
|
-
|
86
|
+
logger.error "Failed to fetch #{id} with error #{e}: #{e.backtrace}"
|
87
87
|
failure_ids << id
|
88
88
|
end
|
89
89
|
end
|
@@ -0,0 +1,71 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'fileutils'
|
4
|
+
require 'logging'
|
5
|
+
|
6
|
+
require_relative '../config/environments'
|
7
|
+
|
8
|
+
module SSTLogger
|
9
|
+
LOG_LEVELS = %w[debug info warn error fatal none].freeze
|
10
|
+
|
11
|
+
def logger
|
12
|
+
SSTLogger.logger
|
13
|
+
end
|
14
|
+
|
15
|
+
class << self
|
16
|
+
def logger
|
17
|
+
@logger ||= new_logger
|
18
|
+
end
|
19
|
+
|
20
|
+
private
|
21
|
+
|
22
|
+
def new_logger
|
23
|
+
logger = Logging.logger['search_solr_tools']
|
24
|
+
|
25
|
+
append_stdout_logger(logger)
|
26
|
+
append_file_logger(logger)
|
27
|
+
|
28
|
+
logger
|
29
|
+
end
|
30
|
+
|
31
|
+
def append_stdout_logger(logger)
|
32
|
+
return if log_stdout_level.nil?
|
33
|
+
|
34
|
+
new_stdout = Logging.appenders.stdout
|
35
|
+
new_stdout.level = log_stdout_level
|
36
|
+
new_stdout.layout = Logging.layouts.pattern(pattern: "%-5l : %m\n")
|
37
|
+
logger.add_appenders new_stdout
|
38
|
+
end
|
39
|
+
|
40
|
+
def append_file_logger(logger)
|
41
|
+
return if log_file == 'none'
|
42
|
+
|
43
|
+
FileUtils.mkdir_p(File.dirname(log_file))
|
44
|
+
new_file = Logging.appenders.file(
|
45
|
+
log_file,
|
46
|
+
layout: Logging.layouts.pattern(pattern: "[%d] %-5l : %m\n")
|
47
|
+
)
|
48
|
+
new_file.level = log_file_level
|
49
|
+
logger.add_appenders new_file
|
50
|
+
end
|
51
|
+
|
52
|
+
def log_file
|
53
|
+
env = SearchSolrTools::SolrEnvironments[]
|
54
|
+
ENV.fetch('SEARCH_SOLR_LOG_FILE', nil) || env[:log_file]
|
55
|
+
end
|
56
|
+
|
57
|
+
def log_file_level
|
58
|
+
env = SearchSolrTools::SolrEnvironments[]
|
59
|
+
log_level(ENV.fetch('SEARCH_SOLR_LOG_LEVEL', nil) || env[:log_file_level])
|
60
|
+
end
|
61
|
+
|
62
|
+
def log_stdout_level
|
63
|
+
env = SearchSolrTools::SolrEnvironments[]
|
64
|
+
log_level(ENV.fetch('SEARCH_SOLR_STDOUT_LEVEL', nil) || env[:log_stdout_level])
|
65
|
+
end
|
66
|
+
|
67
|
+
def log_level(level)
|
68
|
+
LOG_LEVELS.include?(level) ? level : nil
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
data/lib/search_solr_tools.rb
CHANGED
@@ -6,6 +6,8 @@ require_relative 'search_solr_tools/version'
|
|
6
6
|
require_relative 'search_solr_tools/helpers/harvest_status'
|
7
7
|
require_relative 'search_solr_tools/errors/harvest_error'
|
8
8
|
|
9
|
+
require_relative 'search_solr_tools/logging/sst_logger'
|
10
|
+
|
9
11
|
%w[harvesters translators].each do |subdir|
|
10
12
|
Dir[File.join(__dir__, 'search_solr_tools', subdir, '*.rb')].each { |file| require file }
|
11
13
|
end
|
data/search_solr_tools.gemspec
CHANGED
@@ -27,14 +27,16 @@ Gem::Specification.new do |spec|
|
|
27
27
|
|
28
28
|
spec.add_runtime_dependency 'ffi-geos', '~> 2.4.0'
|
29
29
|
spec.add_runtime_dependency 'iso8601', '~> 0.13.0'
|
30
|
+
spec.add_runtime_dependency 'logging', '~> 2.3.1'
|
30
31
|
spec.add_runtime_dependency 'multi_json', '~> 1.15.0'
|
31
|
-
spec.add_runtime_dependency 'nokogiri', '~> 1.15.
|
32
|
+
spec.add_runtime_dependency 'nokogiri', '~> 1.15.4'
|
32
33
|
spec.add_runtime_dependency 'rest-client', '~> 2.1.0'
|
33
34
|
spec.add_runtime_dependency 'rgeo', '~> 3.0.0'
|
34
35
|
spec.add_runtime_dependency 'rgeo-geojson', '~> 2.1.1'
|
35
36
|
spec.add_runtime_dependency 'rsolr', '~> 2.5.0'
|
36
37
|
spec.add_runtime_dependency 'thor', '~> 1.2.2'
|
37
38
|
|
39
|
+
spec.add_development_dependency 'bump', '~> 0.10.0'
|
38
40
|
spec.add_development_dependency 'gem-release', '~> 2.2.2'
|
39
41
|
spec.add_development_dependency 'guard', '~> 2.18.0'
|
40
42
|
spec.add_development_dependency 'guard-rspec', '~> 4.7.3'
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: search_solr_tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 6.
|
4
|
+
version: 6.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Chris Chalstrom
|
@@ -14,7 +14,7 @@ authors:
|
|
14
14
|
autorequire:
|
15
15
|
bindir: bin
|
16
16
|
cert_chain: []
|
17
|
-
date: 2023-
|
17
|
+
date: 2023-09-21 00:00:00.000000000 Z
|
18
18
|
dependencies:
|
19
19
|
- !ruby/object:Gem::Dependency
|
20
20
|
name: ffi-geos
|
@@ -44,6 +44,20 @@ dependencies:
|
|
44
44
|
- - "~>"
|
45
45
|
- !ruby/object:Gem::Version
|
46
46
|
version: 0.13.0
|
47
|
+
- !ruby/object:Gem::Dependency
|
48
|
+
name: logging
|
49
|
+
requirement: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - "~>"
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 2.3.1
|
54
|
+
type: :runtime
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
requirements:
|
58
|
+
- - "~>"
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: 2.3.1
|
47
61
|
- !ruby/object:Gem::Dependency
|
48
62
|
name: multi_json
|
49
63
|
requirement: !ruby/object:Gem::Requirement
|
@@ -64,14 +78,14 @@ dependencies:
|
|
64
78
|
requirements:
|
65
79
|
- - "~>"
|
66
80
|
- !ruby/object:Gem::Version
|
67
|
-
version: 1.15.
|
81
|
+
version: 1.15.4
|
68
82
|
type: :runtime
|
69
83
|
prerelease: false
|
70
84
|
version_requirements: !ruby/object:Gem::Requirement
|
71
85
|
requirements:
|
72
86
|
- - "~>"
|
73
87
|
- !ruby/object:Gem::Version
|
74
|
-
version: 1.15.
|
88
|
+
version: 1.15.4
|
75
89
|
- !ruby/object:Gem::Dependency
|
76
90
|
name: rest-client
|
77
91
|
requirement: !ruby/object:Gem::Requirement
|
@@ -142,6 +156,20 @@ dependencies:
|
|
142
156
|
- - "~>"
|
143
157
|
- !ruby/object:Gem::Version
|
144
158
|
version: 1.2.2
|
159
|
+
- !ruby/object:Gem::Dependency
|
160
|
+
name: bump
|
161
|
+
requirement: !ruby/object:Gem::Requirement
|
162
|
+
requirements:
|
163
|
+
- - "~>"
|
164
|
+
- !ruby/object:Gem::Version
|
165
|
+
version: 0.10.0
|
166
|
+
type: :development
|
167
|
+
prerelease: false
|
168
|
+
version_requirements: !ruby/object:Gem::Requirement
|
169
|
+
requirements:
|
170
|
+
- - "~>"
|
171
|
+
- !ruby/object:Gem::Version
|
172
|
+
version: 0.10.0
|
145
173
|
- !ruby/object:Gem::Dependency
|
146
174
|
name: gem-release
|
147
175
|
requirement: !ruby/object:Gem::Requirement
|
@@ -332,6 +360,7 @@ files:
|
|
332
360
|
- lib/search_solr_tools/helpers/solr_format.rb
|
333
361
|
- lib/search_solr_tools/helpers/translate_spatial_coverage.rb
|
334
362
|
- lib/search_solr_tools/helpers/translate_temporal_coverage.rb
|
363
|
+
- lib/search_solr_tools/logging/sst_logger.rb
|
335
364
|
- lib/search_solr_tools/translators/nsidc_json.rb
|
336
365
|
- lib/search_solr_tools/version.rb
|
337
366
|
- search_solr_tools.gemspec
|
@@ -354,7 +383,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
354
383
|
- !ruby/object:Gem::Version
|
355
384
|
version: '0'
|
356
385
|
requirements: []
|
357
|
-
rubygems_version: 3.4.
|
386
|
+
rubygems_version: 3.4.10
|
358
387
|
signing_key:
|
359
388
|
specification_version: 4
|
360
389
|
summary: Tools to harvest and manage various scientific dataset feeds in a Solr instance.
|