search_solr_tools 6.4.0 → 6.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +12 -0
- data/README.md +97 -28
- data/bin/search_solr_tools +18 -5
- data/lib/search_solr_tools/config/environments.rb +2 -2
- data/lib/search_solr_tools/config/environments.yaml +17 -2
- data/lib/search_solr_tools/errors/harvest_error.rb +6 -2
- data/lib/search_solr_tools/harvesters/auto_suggest.rb +2 -2
- data/lib/search_solr_tools/harvesters/base.rb +17 -15
- data/lib/search_solr_tools/harvesters/nsidc_auto_suggest.rb +1 -1
- data/lib/search_solr_tools/harvesters/nsidc_json.rb +5 -5
- data/lib/search_solr_tools/logging/sst_logger.rb +71 -0
- data/lib/search_solr_tools/version.rb +1 -1
- data/lib/search_solr_tools.rb +2 -0
- data/search_solr_tools.gemspec +3 -1
- metadata +34 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: daf440444527ee68761274399910b02c51d7994a1b605c57f64b2666d7a243c7
|
4
|
+
data.tar.gz: 13a1a0ccd9d623aad334f68c6d314f7205a48a45546393d3d2ed4d71a95899cc
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2b195f73dc88f1fcf83b354f7ea572082419172970ced5e15e7f7afdb89f47da85424b473e9aa87462483f3ba6c5a2e1e2eb33c1493d72a978ff6570d86d9645
|
7
|
+
data.tar.gz: 96b72cadd2046588046b56a9911ee14caefead8813c38d579b93d9df56b00afacfeae13033c3ed5b866076c81aa694ccdf77fe024c354191c52d65db04c7f186
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,15 @@
|
|
1
|
+
## v6.5.0 (2023-09-21)
|
2
|
+
|
3
|
+
- Adding logging functionality to the code, including the ability
|
4
|
+
to specify log file destination and log level for both the file and
|
5
|
+
console output
|
6
|
+
|
7
|
+
## v6.4.1 (2023-09-15)
|
8
|
+
|
9
|
+
- Added GitHub Action workflows for continuous integration features
|
10
|
+
- Updated bump rake task to use Bump gem
|
11
|
+
- Removed release rake task, moved it to the CI workflow
|
12
|
+
|
1
13
|
## v6.4.0 (2023-08-14)
|
2
14
|
|
3
15
|
- Fixed a bug with the sanitization, which was trying to modify the
|
data/README.md
CHANGED
@@ -101,51 +101,79 @@ the tests whenever the appropriate files are changed.
|
|
101
101
|
|
102
102
|
Please be sure to run them in the `bundle exec` context if you're utilizing bundler.
|
103
103
|
|
104
|
+
By default, tests are run with minimal logging - no log file and only fatal errors
|
105
|
+
written to the console. This can be changed by setting the environment variables
|
106
|
+
as described in [Logging](#logging) below.
|
107
|
+
|
104
108
|
### Creating Releases (NSIDC devs only)
|
105
109
|
|
106
110
|
Requirements:
|
107
111
|
|
108
112
|
* Ruby > 3.2.2
|
109
113
|
* [Bundler](http://bundler.io/)
|
110
|
-
* [Gem Release](https://github.com/svenfuchs/gem-release)
|
111
114
|
* [Rake](https://github.com/ruby/rake)
|
112
|
-
* A [RubyGems](https://rubygems.org) account that has
|
113
|
-
[ownership](http://guides.rubygems.org/publishing/) of the gem
|
114
115
|
* RuboCop and the unit tests should all pass (`rake`)
|
115
116
|
|
116
|
-
|
117
|
-
`rake release:*` tasks. Update it manually to insert the correct version and
|
118
|
-
date, and commit the file, before creating the release package.
|
117
|
+
To make a release, follow these steps:
|
119
118
|
|
120
|
-
|
121
|
-
|
119
|
+
1. Confirm no errors are returned by `bundle exec rubocop` *
|
120
|
+
2. Confirm all tests pass (`bundle exec rake spec:unit`) *
|
121
|
+
3. Ensure that the `CHANGELOG.md` file is up to date with an `Unreleased`
|
122
|
+
header.
|
123
|
+
4. Submit a Pull Request
|
124
|
+
5. Once the PR has been reviewed and approved, merge the branch into `main`
|
125
|
+
6. On your local machine, ensure you are on the `main` branch (and have
|
126
|
+
it up-to-date), and run `bundle exec rake bump:<part>` (see below)
|
127
|
+
* This will trigger the GitHub Actions CI workflow to push a release to
|
128
|
+
RubyGems.
|
122
129
|
|
123
|
-
|
124
|
-
|
125
|
-
| `rake release:pre[false]` | Increase the current prerelease version number, push changes |
|
126
|
-
| `rake release:pre[true]` | Increase the current prerelease version number, publish release\* |
|
127
|
-
| `rake release:none` | Drop the prerelease version, publish release\*, then `pre[false]` (does a patch release) |
|
128
|
-
| `rake release:minor` | Increase the minor version number, publish release\*, then `pre[false]` |
|
129
|
-
| `rake release:major` | Increase the major version number, publish release\*, then `pre[false]` |
|
130
|
+
The steps marked `*` above don't need to be done manually; every time a commit
|
131
|
+
is pushed to the GitHub repository, these tests will be run automatically.
|
130
132
|
|
131
|
-
|
133
|
+
The first 4 steps above are self-explanatory. More information on the last
|
134
|
+
steps can be found below.
|
132
135
|
|
133
|
-
|
134
|
-
* the changes are pushed
|
135
|
-
* the tagged version is built and published to RubyGems
|
136
|
+
#### Version Bumping
|
136
137
|
|
137
|
-
|
138
|
-
order to publish a new version of the gem to Rubygems. To get the lastest API key:
|
138
|
+
Running the `bundle exec rake bump:<part>` tasks will do the following actions:
|
139
139
|
|
140
|
-
|
140
|
+
1. The gem version will be updated locally
|
141
|
+
2. The `CHANGELOG.md` file will updated with the updated gem version and date
|
142
|
+
3. A tag `vx.x.x` will be created (with the new gem version)
|
143
|
+
4. The files updated by the bump will be pushed to the GitHub repository, along
|
144
|
+
with the newly created tag.
|
145
|
+
|
146
|
+
The sub-tasks associated with bump will allow the type of bump to be determined:
|
147
|
+
|
148
|
+
| Command | Description |
|
149
|
+
|---------------------------|----------------------------------------------------------------------------------------------------|
|
150
|
+
| `rake bump:pre` | Increase the current prerelease version number (v1.2.3 -> v1.2.3.pre1; v1.2.3.pre1 -> v1.2.3.pre2) |
|
151
|
+
| `rake bump:patch` | Increase the current patch number (v1.2.0 -> v1.2.1; v1.2.4 -> v1.2.4) |
|
152
|
+
| `rake bump:minor` | Increase the minor version number (v1.2.0 -> v1.3.0; v1.2.4 -> v1.3.0) |
|
153
|
+
| `rake bump:major` | Increase the major version number (v1.2.0 _> v2.0.0; v1.2.4 -> v2.0.0) |
|
154
|
+
|
155
|
+
Using any bump other than `pre` will remove the `pre` suffix from the version as well.
|
156
|
+
|
157
|
+
#### Release to RubyGems
|
141
158
|
|
142
|
-
|
159
|
+
When a tag in the format of `vx.y.z` (including a `pre` suffix) is pushed to GitHub,
|
160
|
+
it will trigger the GitHub Actions release workflow. This workflow will:
|
143
161
|
|
144
|
-
|
145
|
-
|
146
|
-
|
147
|
-
|
148
|
-
|
162
|
+
1. Build the gem
|
163
|
+
2. Push the gem to RubyGems
|
164
|
+
|
165
|
+
The CI workflow has the credentials set up to push to RubyGems, so no user intervention
|
166
|
+
is needed, and the workflow itself does not have to be manually triggered.
|
167
|
+
|
168
|
+
If needed, the release can also be done locally by running the command
|
169
|
+
`bundle exec gem release`. In order for this to work, you will need to have a
|
170
|
+
local copy of current Rubygems API key for the _NSIDC developer user_ account in
|
171
|
+
To get the lastest API key:
|
172
|
+
|
173
|
+
`curl -u <username> https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials; chmod 0600 ~/.gem/credentials`
|
174
|
+
|
175
|
+
It is recommended that this not be run locally, however; use the GitHub Actions CI
|
176
|
+
workflow instead.
|
149
177
|
|
150
178
|
### SOLR
|
151
179
|
|
@@ -184,6 +212,47 @@ which can be modified, or additional environments can be added by just adding a
|
|
184
212
|
new YAML stanza with the right keys; this new environment can then be used with
|
185
213
|
the `--environment` flag when running `search_solr_tools harvest`.
|
186
214
|
|
215
|
+
#### Logging
|
216
|
+
|
217
|
+
By default, when running the harvest, harvest logs are written to the file
|
218
|
+
`/var/log/search-solr-tools.log` (set to `warn` level), as well as to the console
|
219
|
+
at `info` level. These settings are configured in the `environments.yaml` config
|
220
|
+
file, in the `common` section.
|
221
|
+
|
222
|
+
The keys in the `environments.yaml` file to consider are as follows:
|
223
|
+
|
224
|
+
* `log_file` - The full name and path of the file to which log output will be written
|
225
|
+
to. If set to the special value `none`, no log file will be written to at all.
|
226
|
+
Log output will be **appended** to the file, if it exists; otherwise, the file will
|
227
|
+
be created.
|
228
|
+
* `log_file_level` - Indicates the level of logging which should be written to the log file.
|
229
|
+
* `log_stdout_level` - Indicates the level of logging which should be written to the console.
|
230
|
+
This can be different than the level written to the log file.
|
231
|
+
|
232
|
+
You can also override the configuration file settings at the command line with the
|
233
|
+
following environment variables (useful when for doing development work):
|
234
|
+
|
235
|
+
* `SEARCH_SOLR_LOG_FILE` - Overrides the `log_file` setting
|
236
|
+
* `SEARCH_SOLR_LOG_LEVEL` - Overrides the `log_file_level` setting
|
237
|
+
* `SEARCH_SOLR_STDOUT_LEVEL` - Overrides the `log_stdout_level` setting
|
238
|
+
|
239
|
+
When running the spec tests, `SEARCH_SOLR_LOG_FILE` is set to `none` and
|
240
|
+
`SEARCH_SOLR_STDOUT_LEVEL` is set to `fatal`, unless you manually set those
|
241
|
+
environment variables prior to running the tests. This is to keep the test output
|
242
|
+
clean unless you need more detail for debugging.
|
243
|
+
|
244
|
+
The following are the levels of logging that can be specified. These levels are
|
245
|
+
cumulative; for example, `error` will also output `fatal` log entries, and `debug`
|
246
|
+
will output **all** log entries.
|
247
|
+
|
248
|
+
* `none` - No logging outputs will be written.
|
249
|
+
* `fatal` - Only outputs errors which result in a crash.
|
250
|
+
* `error` - Outputs any error that occurs while harvesting.
|
251
|
+
* `warn` - Outputs warnings that occur that do not cause issues with the harvesting,
|
252
|
+
but might indicate things that may need to be addressed (such as deprecations, etc)
|
253
|
+
* `info` - Outputs general information, such as harvesting status
|
254
|
+
* `debug` - Outputs detailed information that can be used for debugging and code tracing.
|
255
|
+
|
187
256
|
## Organization Info
|
188
257
|
|
189
258
|
### How to contact NSIDC
|
data/bin/search_solr_tools
CHANGED
@@ -6,8 +6,14 @@ require 'thor'
|
|
6
6
|
|
7
7
|
# rubocop:disable Metrics/AbcSize
|
8
8
|
class SolrHarvestCLI < Thor
|
9
|
+
include SSTLogger
|
10
|
+
|
9
11
|
map %w[--version -v] => :__print_version
|
10
12
|
|
13
|
+
def self.exit_on_failure?
|
14
|
+
false
|
15
|
+
end
|
16
|
+
|
11
17
|
desc '--version, -v', 'print the version'
|
12
18
|
def __print_version
|
13
19
|
puts SearchSolrTools::VERSION
|
@@ -39,10 +45,11 @@ class SolrHarvestCLI < Thor
|
|
39
45
|
rescue StandardError => e
|
40
46
|
solr_status = false
|
41
47
|
source_status = false
|
42
|
-
|
48
|
+
logger.error "Ping failed for #{target}: #{e}}"
|
43
49
|
end
|
44
50
|
solr_success &&= solr_status
|
45
51
|
source_success &&= source_status
|
52
|
+
|
46
53
|
puts "Target: #{target}, Solr ping OK? #{solr_status}, data center ping OK? #{source_status}"
|
47
54
|
end
|
48
55
|
|
@@ -61,7 +68,7 @@ class SolrHarvestCLI < Thor
|
|
61
68
|
option :die_on_failure, type: :boolean
|
62
69
|
def harvest(die_on_failure = options[:die_on_failure] || false)
|
63
70
|
options[:data_center].each do |target|
|
64
|
-
|
71
|
+
logger.info "Target: #{target}"
|
65
72
|
begin
|
66
73
|
harvest_class = get_harvester_class(target)
|
67
74
|
harvester = harvest_class.new(options[:environment], die_on_failure:)
|
@@ -73,12 +80,12 @@ class SolrHarvestCLI < Thor
|
|
73
80
|
|
74
81
|
harvester.harvest_and_delete
|
75
82
|
rescue SearchSolrTools::Errors::HarvestError => e
|
76
|
-
|
83
|
+
logger.error "THERE WERE HARVEST STATUS ERRORS:\n#{e.message}"
|
77
84
|
exit e.exit_code
|
78
85
|
rescue StandardError => e
|
79
86
|
# If it gets here, there is an error that we aren't expecting.
|
80
|
-
|
81
|
-
|
87
|
+
logger.error "harvest failed for #{target}: #{e.message}"
|
88
|
+
logger.error e.backtrace
|
82
89
|
exit SearchSolrTools::Errors::HarvestError::ERRCODE_OTHER
|
83
90
|
end
|
84
91
|
end
|
@@ -93,16 +100,20 @@ class SolrHarvestCLI < Thor
|
|
93
100
|
option :environment, required: true
|
94
101
|
def delete_all
|
95
102
|
env = SearchSolrTools::SolrEnvironments[options[:environment]]
|
103
|
+
logger.info('DELETE ALL started')
|
96
104
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<delete><query>*:*</query></delete>'`
|
97
105
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<commit/>'`
|
106
|
+
logger.info('DELETE ALL complete')
|
98
107
|
end
|
99
108
|
|
100
109
|
desc 'delete_all_auto_suggest', 'Delete all documents from the auto_suggest index'
|
101
110
|
option :environment, required: true
|
102
111
|
def delete_all_auto_suggest
|
103
112
|
env = SearchSolrTools::SolrEnvironments[options[:environment]]
|
113
|
+
logger.info('DELETE ALL AUTO_SUGGEST started')
|
104
114
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<delete><query>*:*</query></delete>'`
|
105
115
|
`curl 'http://#{env[:host]}:#{env[:port]}/solr/update' -H 'Content-Type: text/xml; charset=utf-8' --data '<commit/>'`
|
116
|
+
logger.info('DELETE ALL AUTO_SUGGEST complete')
|
106
117
|
end
|
107
118
|
|
108
119
|
desc 'delete_by_data_center', 'Force deletion of documents for a specific data center with timestamps before the passed timestamp in format iso8601 (2014-07-14T21:49:21Z)'
|
@@ -110,11 +121,13 @@ class SolrHarvestCLI < Thor
|
|
110
121
|
option :environment, required: true
|
111
122
|
option :data_center, required: true
|
112
123
|
def delete_by_data_center
|
124
|
+
logger.info("DELETE ALL for data center '#{options[:data_center]}' started")
|
113
125
|
harvester = get_harvester_class(options[:data_center]).new options[:environment]
|
114
126
|
harvester.delete_old_documents(options[:timestamp],
|
115
127
|
"data_centers:\"#{SearchSolrTools::Helpers::SolrFormat::DATA_CENTER_NAMES[options[:data_center].upcase.to_sym][:long_name]}\"",
|
116
128
|
SearchSolrTools::SolrEnvironments[harvester.environment][:collection_name],
|
117
129
|
true)
|
130
|
+
logger.info("DELETE ALL for data center '#{options[:data_center]}' complete")
|
118
131
|
end
|
119
132
|
|
120
133
|
no_tasks do
|
@@ -5,10 +5,10 @@ require 'yaml'
|
|
5
5
|
module SearchSolrTools
|
6
6
|
# configuration to work with solr locally, or on integration/qa/staging/prod
|
7
7
|
module SolrEnvironments
|
8
|
-
YAML_ENVS = YAML.load_file(File.expand_path('environments.yaml', __dir__))
|
8
|
+
YAML_ENVS = YAML.load_file(File.expand_path('environments.yaml', __dir__), aliases: true)
|
9
9
|
|
10
10
|
def self.[](env = :development)
|
11
|
-
YAML_ENVS[
|
11
|
+
YAML_ENVS[env.to_sym]
|
12
12
|
end
|
13
13
|
end
|
14
14
|
end
|
@@ -1,4 +1,4 @@
|
|
1
|
-
:common:
|
1
|
+
:common: &common
|
2
2
|
:auto_suggest_collection_name: auto_suggest
|
3
3
|
:collection_name: nsidc_oai
|
4
4
|
:collection_path: solr
|
@@ -9,33 +9,48 @@
|
|
9
9
|
# should be. GLA01.018 will show up if we use DCS API v2.
|
10
10
|
:nsidc_oai_identifiers_url: oai?verb=ListIdentifiers&metadataPrefix=dif&retired=false
|
11
11
|
|
12
|
+
# Log details. Can be overridden by environment-specific values
|
13
|
+
:log_file: /var/log/search-solr-tools.log
|
14
|
+
:log_file_level: warn
|
15
|
+
:log_stdout_level: info
|
16
|
+
|
12
17
|
:local:
|
18
|
+
<<: *common
|
13
19
|
:host: localhost
|
14
20
|
:nsidc_dataset_metadata_url: http://integration.nsidc.org/api/dataset/metadata/
|
15
21
|
|
16
|
-
:dev:
|
22
|
+
:dev: &dev
|
23
|
+
<<: *common
|
17
24
|
## For the below, you'll need to instantiate your own search-solr instance, and point host to that.
|
18
25
|
:host: dev.search-solr.USERNAME.dev.int.nsidc.org
|
19
26
|
## For the metadata content, either set up your own instance of dataset-catalog-services
|
20
27
|
## or change the URL below to point to integration
|
21
28
|
:nsidc_dataset_metadata_url: http://dev.dcs.USERNAME.dev.int.nsidc.org:1580/api/dataset/metadata/
|
22
29
|
|
30
|
+
:development:
|
31
|
+
<<: *dev
|
32
|
+
|
23
33
|
:integration:
|
34
|
+
<<: *common
|
24
35
|
:host: integration.search-solr.apps.int.nsidc.org
|
25
36
|
:nsidc_dataset_metadata_url: http://integration.nsidc.org/api/dataset/metadata/
|
26
37
|
|
27
38
|
:qa:
|
39
|
+
<<: *common
|
28
40
|
:host: qa.search-solr.apps.int.nsidc.org
|
29
41
|
:nsidc_dataset_metadata_url: http://qa.nsidc.org/api/dataset/metadata/
|
30
42
|
|
31
43
|
:staging:
|
44
|
+
<<: *common
|
32
45
|
:host: staging.search-solr.apps.int.nsidc.org
|
33
46
|
:nsidc_dataset_metadata_url: http://staging.nsidc.org/api/dataset/metadata/
|
34
47
|
|
35
48
|
:blue:
|
49
|
+
<<: *common
|
36
50
|
:host: blue.search-solr.apps.int.nsidc.org
|
37
51
|
:nsidc_dataset_metadata_url: http://nsidc.org/api/dataset/metadata/
|
38
52
|
|
39
53
|
:production:
|
54
|
+
<<: *common
|
40
55
|
:host: search-solr.apps.int.nsidc.org
|
41
56
|
:nsidc_dataset_metadata_url: http://nsidc.org/api/dataset/metadata/
|
@@ -1,8 +1,12 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
+
require_relative '../logging/sst_logger'
|
4
|
+
|
3
5
|
module SearchSolrTools
|
4
6
|
module Errors
|
5
7
|
class HarvestError < StandardError
|
8
|
+
include SSTLogger
|
9
|
+
|
6
10
|
ERRCODE_SOLR_PING = 1
|
7
11
|
ERRCODE_SOURCE_PING = 2
|
8
12
|
ERRCODE_SOURCE_NO_RESULTS = 4
|
@@ -73,11 +77,11 @@ module SearchSolrTools
|
|
73
77
|
# rubocop:disable Metrics/AbcSize
|
74
78
|
def exit_code
|
75
79
|
if @status_data.nil?
|
76
|
-
|
80
|
+
logger.error "OTHER ERROR REPORTED: #{@other_message}"
|
77
81
|
return ERRCODE_OTHER
|
78
82
|
end
|
79
83
|
|
80
|
-
|
84
|
+
logger.error "EXIT CODE STATUS:\n#{@status_data.status}"
|
81
85
|
|
82
86
|
code = 0
|
83
87
|
code += ERRCODE_SOLR_PING unless @status_data.ping_solr
|
@@ -51,10 +51,10 @@ module SearchSolrTools
|
|
51
51
|
status = insert_solr_doc add_docs, Base::JSON_CONTENT_TYPE, @env_settings[:auto_suggest_collection_name]
|
52
52
|
|
53
53
|
if status == Helpers::HarvestStatus::INGEST_OK
|
54
|
-
|
54
|
+
logger.info "Added #{add_docs.size} auto suggest documents in one commit"
|
55
55
|
Helpers::HarvestStatus.new(Helpers::HarvestStatus::INGEST_OK => add_docs)
|
56
56
|
else
|
57
|
-
|
57
|
+
logger.error "Failed adding #{add_docs.size} documents in single commit, retrying one by one"
|
58
58
|
new_add_docs = []
|
59
59
|
add_docs.each do |doc|
|
60
60
|
new_add_docs << { 'add' => { 'doc' => doc } }
|
@@ -15,6 +15,8 @@ module SearchSolrTools
|
|
15
15
|
module Harvesters
|
16
16
|
# base class for solr harvesters
|
17
17
|
class Base
|
18
|
+
include SSTLogger
|
19
|
+
|
18
20
|
attr_accessor :environment
|
19
21
|
|
20
22
|
DELETE_DOCUMENTS_RATIO = 0.1
|
@@ -50,10 +52,10 @@ module SearchSolrTools
|
|
50
52
|
begin
|
51
53
|
RestClient.get(url) do |response, _request, _result|
|
52
54
|
success = response.code == 200
|
53
|
-
|
55
|
+
logger.error "Error in ping request: #{response.body}" unless success
|
54
56
|
end
|
55
57
|
rescue StandardError => e
|
56
|
-
|
58
|
+
logger.error "Rest exception while pinging Solr: #{e}"
|
57
59
|
end
|
58
60
|
success
|
59
61
|
end
|
@@ -62,7 +64,7 @@ module SearchSolrTools
|
|
62
64
|
# to "ping" the data center. Returns true if the ping is successful (or, as
|
63
65
|
# in this default, no ping method was defined)
|
64
66
|
def ping_source
|
65
|
-
|
67
|
+
logger.info 'Harvester does not have ping method defined, assuming true'
|
66
68
|
true
|
67
69
|
end
|
68
70
|
|
@@ -81,9 +83,9 @@ module SearchSolrTools
|
|
81
83
|
solr = RSolr.connect url: solr_url + "/#{solr_core}"
|
82
84
|
unchanged_count = (solr.get 'select', params: { wt: :ruby, q: delete_query, rows: 0 })['response']['numFound'].to_i
|
83
85
|
if unchanged_count.zero?
|
84
|
-
|
86
|
+
logger.info "All documents were updated after #{timestamp}, nothing to delete"
|
85
87
|
else
|
86
|
-
|
88
|
+
logger.info "Begin removing documents older than #{timestamp}"
|
87
89
|
remove_documents(solr, delete_query, constraints, force, unchanged_count)
|
88
90
|
end
|
89
91
|
end
|
@@ -99,13 +101,13 @@ module SearchSolrTools
|
|
99
101
|
def remove_documents(solr, delete_query, constraints, force, numfound)
|
100
102
|
all_response_count = (solr.get 'select', params: { wt: :ruby, q: constraints, rows: 0 })['response']['numFound']
|
101
103
|
if force || (numfound / all_response_count.to_f < DELETE_DOCUMENTS_RATIO)
|
102
|
-
|
104
|
+
logger.info "Deleting #{numfound} documents for #{constraints}"
|
103
105
|
solr.delete_by_query delete_query
|
104
106
|
solr.commit
|
105
107
|
else
|
106
|
-
|
107
|
-
|
108
|
-
|
108
|
+
logger.info "Failed to delete records older than current harvest start because they exceeded #{DELETE_DOCUMENTS_RATIO} of the total records for this data center."
|
109
|
+
logger.info "\tTotal records: #{all_response_count}"
|
110
|
+
logger.info "\tNon-updated records: #{numfound}"
|
109
111
|
end
|
110
112
|
end
|
111
113
|
|
@@ -121,8 +123,8 @@ module SearchSolrTools
|
|
121
123
|
status.record_status doc_status
|
122
124
|
doc_status == Helpers::HarvestStatus::INGEST_OK ? success += 1 : failure += 1
|
123
125
|
end
|
124
|
-
|
125
|
-
|
126
|
+
logger.info "#{success} document#{success == 1 ? '' : 's'} successfully added to Solr."
|
127
|
+
logger.info "#{failure} document#{failure == 1 ? '' : 's'} not added to Solr."
|
126
128
|
|
127
129
|
status
|
128
130
|
end
|
@@ -146,14 +148,14 @@ module SearchSolrTools
|
|
146
148
|
RestClient.post(url, doc_serialized, content_type:) do |response, _request, _result|
|
147
149
|
success = response.code == 200
|
148
150
|
unless success
|
149
|
-
|
151
|
+
logger.error "Error for #{doc_serialized}\n\n response: #{response.body}"
|
150
152
|
status = Helpers::HarvestStatus::INGEST_ERR_SOLR_ERROR
|
151
153
|
end
|
152
154
|
end
|
153
155
|
rescue StandardError => e
|
154
156
|
# TODO: Need to provide more detail re: this failure so we know whether to
|
155
157
|
# exit the job with a status != 0
|
156
|
-
|
158
|
+
logger.error "Rest exception while POSTing to Solr: #{e}, for doc: #{doc_serialized}"
|
157
159
|
status = Helpers::HarvestStatus::INGEST_ERR_SOLR_ERROR
|
158
160
|
end
|
159
161
|
status
|
@@ -177,11 +179,11 @@ module SearchSolrTools
|
|
177
179
|
request_url = encode_data_provider_url(request_url)
|
178
180
|
|
179
181
|
begin
|
180
|
-
|
182
|
+
logger.debug "Request: #{request_url}"
|
181
183
|
response = URI.parse(request_url).open(read_timeout: timeout, 'Content-Type' => content_type)
|
182
184
|
rescue OpenURI::HTTPError, Timeout::Error, Errno::ETIMEDOUT => e
|
183
185
|
retries_left -= 1
|
184
|
-
|
186
|
+
logger.error "## REQUEST FAILED ## #{e.class} ## Retrying #{retries_left} more times..."
|
185
187
|
|
186
188
|
retry if retries_left.positive?
|
187
189
|
|
@@ -6,7 +6,7 @@ module SearchSolrTools
|
|
6
6
|
module Harvesters
|
7
7
|
class NsidcAutoSuggest < AutoSuggest
|
8
8
|
def harvest_and_delete
|
9
|
-
|
9
|
+
logger.info 'Building auto-suggest indexes for NSIDC'
|
10
10
|
super(method(:harvest), 'source:"NSIDC"', @env_settings[:auto_suggest_collection_name])
|
11
11
|
end
|
12
12
|
|
@@ -21,13 +21,13 @@ module SearchSolrTools
|
|
21
21
|
return response.code == 200
|
22
22
|
end
|
23
23
|
rescue StandardError
|
24
|
-
|
24
|
+
logger.error "Error trying to get options for #{nsidc_json_url} (ping)"
|
25
25
|
end
|
26
26
|
false
|
27
27
|
end
|
28
28
|
|
29
29
|
def harvest_and_delete
|
30
|
-
|
30
|
+
logger.info "Running harvest of NSIDC catalog from #{nsidc_json_url}"
|
31
31
|
super(method(:harvest_nsidc_json_into_solr), "data_centers:\"#{Helpers::SolrFormat::DATA_CENTER_NAMES[:NSIDC][:long_name]}\"")
|
32
32
|
end
|
33
33
|
|
@@ -47,8 +47,8 @@ module SearchSolrTools
|
|
47
47
|
rescue Errors::HarvestError => e
|
48
48
|
raise e
|
49
49
|
rescue StandardError => e
|
50
|
-
|
51
|
-
|
50
|
+
logger.error "An unexpected exception occurred while trying to harvest or insert: #{e}"
|
51
|
+
logger.error e.backtrace
|
52
52
|
status = Helpers::HarvestStatus.new(Helpers::HarvestStatus::OTHER_ERROR => e)
|
53
53
|
raise Errors::HarvestError, status
|
54
54
|
end
|
@@ -83,7 +83,7 @@ module SearchSolrTools
|
|
83
83
|
begin
|
84
84
|
docs << { 'add' => { 'doc' => @translator.translate(fetch_json_from_nsidc(id)) } }
|
85
85
|
rescue StandardError => e
|
86
|
-
|
86
|
+
logger.error "Failed to fetch #{id} with error #{e}: #{e.backtrace}"
|
87
87
|
failure_ids << id
|
88
88
|
end
|
89
89
|
end
|
@@ -0,0 +1,71 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'fileutils'
|
4
|
+
require 'logging'
|
5
|
+
|
6
|
+
require_relative '../config/environments'
|
7
|
+
|
8
|
+
module SSTLogger
|
9
|
+
LOG_LEVELS = %w[debug info warn error fatal none].freeze
|
10
|
+
|
11
|
+
def logger
|
12
|
+
SSTLogger.logger
|
13
|
+
end
|
14
|
+
|
15
|
+
class << self
|
16
|
+
def logger
|
17
|
+
@logger ||= new_logger
|
18
|
+
end
|
19
|
+
|
20
|
+
private
|
21
|
+
|
22
|
+
def new_logger
|
23
|
+
logger = Logging.logger['search_solr_tools']
|
24
|
+
|
25
|
+
append_stdout_logger(logger)
|
26
|
+
append_file_logger(logger)
|
27
|
+
|
28
|
+
logger
|
29
|
+
end
|
30
|
+
|
31
|
+
def append_stdout_logger(logger)
|
32
|
+
return if log_stdout_level.nil?
|
33
|
+
|
34
|
+
new_stdout = Logging.appenders.stdout
|
35
|
+
new_stdout.level = log_stdout_level
|
36
|
+
new_stdout.layout = Logging.layouts.pattern(pattern: "%-5l : %m\n")
|
37
|
+
logger.add_appenders new_stdout
|
38
|
+
end
|
39
|
+
|
40
|
+
def append_file_logger(logger)
|
41
|
+
return if log_file == 'none'
|
42
|
+
|
43
|
+
FileUtils.mkdir_p(File.dirname(log_file))
|
44
|
+
new_file = Logging.appenders.file(
|
45
|
+
log_file,
|
46
|
+
layout: Logging.layouts.pattern(pattern: "[%d] %-5l : %m\n")
|
47
|
+
)
|
48
|
+
new_file.level = log_file_level
|
49
|
+
logger.add_appenders new_file
|
50
|
+
end
|
51
|
+
|
52
|
+
def log_file
|
53
|
+
env = SearchSolrTools::SolrEnvironments[]
|
54
|
+
ENV.fetch('SEARCH_SOLR_LOG_FILE', nil) || env[:log_file]
|
55
|
+
end
|
56
|
+
|
57
|
+
def log_file_level
|
58
|
+
env = SearchSolrTools::SolrEnvironments[]
|
59
|
+
log_level(ENV.fetch('SEARCH_SOLR_LOG_LEVEL', nil) || env[:log_file_level])
|
60
|
+
end
|
61
|
+
|
62
|
+
def log_stdout_level
|
63
|
+
env = SearchSolrTools::SolrEnvironments[]
|
64
|
+
log_level(ENV.fetch('SEARCH_SOLR_STDOUT_LEVEL', nil) || env[:log_stdout_level])
|
65
|
+
end
|
66
|
+
|
67
|
+
def log_level(level)
|
68
|
+
LOG_LEVELS.include?(level) ? level : nil
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
data/lib/search_solr_tools.rb
CHANGED
@@ -6,6 +6,8 @@ require_relative 'search_solr_tools/version'
|
|
6
6
|
require_relative 'search_solr_tools/helpers/harvest_status'
|
7
7
|
require_relative 'search_solr_tools/errors/harvest_error'
|
8
8
|
|
9
|
+
require_relative 'search_solr_tools/logging/sst_logger'
|
10
|
+
|
9
11
|
%w[harvesters translators].each do |subdir|
|
10
12
|
Dir[File.join(__dir__, 'search_solr_tools', subdir, '*.rb')].each { |file| require file }
|
11
13
|
end
|
data/search_solr_tools.gemspec
CHANGED
@@ -27,14 +27,16 @@ Gem::Specification.new do |spec|
|
|
27
27
|
|
28
28
|
spec.add_runtime_dependency 'ffi-geos', '~> 2.4.0'
|
29
29
|
spec.add_runtime_dependency 'iso8601', '~> 0.13.0'
|
30
|
+
spec.add_runtime_dependency 'logging', '~> 2.3.1'
|
30
31
|
spec.add_runtime_dependency 'multi_json', '~> 1.15.0'
|
31
|
-
spec.add_runtime_dependency 'nokogiri', '~> 1.15.
|
32
|
+
spec.add_runtime_dependency 'nokogiri', '~> 1.15.4'
|
32
33
|
spec.add_runtime_dependency 'rest-client', '~> 2.1.0'
|
33
34
|
spec.add_runtime_dependency 'rgeo', '~> 3.0.0'
|
34
35
|
spec.add_runtime_dependency 'rgeo-geojson', '~> 2.1.1'
|
35
36
|
spec.add_runtime_dependency 'rsolr', '~> 2.5.0'
|
36
37
|
spec.add_runtime_dependency 'thor', '~> 1.2.2'
|
37
38
|
|
39
|
+
spec.add_development_dependency 'bump', '~> 0.10.0'
|
38
40
|
spec.add_development_dependency 'gem-release', '~> 2.2.2'
|
39
41
|
spec.add_development_dependency 'guard', '~> 2.18.0'
|
40
42
|
spec.add_development_dependency 'guard-rspec', '~> 4.7.3'
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: search_solr_tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 6.
|
4
|
+
version: 6.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Chris Chalstrom
|
@@ -14,7 +14,7 @@ authors:
|
|
14
14
|
autorequire:
|
15
15
|
bindir: bin
|
16
16
|
cert_chain: []
|
17
|
-
date: 2023-
|
17
|
+
date: 2023-09-21 00:00:00.000000000 Z
|
18
18
|
dependencies:
|
19
19
|
- !ruby/object:Gem::Dependency
|
20
20
|
name: ffi-geos
|
@@ -44,6 +44,20 @@ dependencies:
|
|
44
44
|
- - "~>"
|
45
45
|
- !ruby/object:Gem::Version
|
46
46
|
version: 0.13.0
|
47
|
+
- !ruby/object:Gem::Dependency
|
48
|
+
name: logging
|
49
|
+
requirement: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - "~>"
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 2.3.1
|
54
|
+
type: :runtime
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
requirements:
|
58
|
+
- - "~>"
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: 2.3.1
|
47
61
|
- !ruby/object:Gem::Dependency
|
48
62
|
name: multi_json
|
49
63
|
requirement: !ruby/object:Gem::Requirement
|
@@ -64,14 +78,14 @@ dependencies:
|
|
64
78
|
requirements:
|
65
79
|
- - "~>"
|
66
80
|
- !ruby/object:Gem::Version
|
67
|
-
version: 1.15.
|
81
|
+
version: 1.15.4
|
68
82
|
type: :runtime
|
69
83
|
prerelease: false
|
70
84
|
version_requirements: !ruby/object:Gem::Requirement
|
71
85
|
requirements:
|
72
86
|
- - "~>"
|
73
87
|
- !ruby/object:Gem::Version
|
74
|
-
version: 1.15.
|
88
|
+
version: 1.15.4
|
75
89
|
- !ruby/object:Gem::Dependency
|
76
90
|
name: rest-client
|
77
91
|
requirement: !ruby/object:Gem::Requirement
|
@@ -142,6 +156,20 @@ dependencies:
|
|
142
156
|
- - "~>"
|
143
157
|
- !ruby/object:Gem::Version
|
144
158
|
version: 1.2.2
|
159
|
+
- !ruby/object:Gem::Dependency
|
160
|
+
name: bump
|
161
|
+
requirement: !ruby/object:Gem::Requirement
|
162
|
+
requirements:
|
163
|
+
- - "~>"
|
164
|
+
- !ruby/object:Gem::Version
|
165
|
+
version: 0.10.0
|
166
|
+
type: :development
|
167
|
+
prerelease: false
|
168
|
+
version_requirements: !ruby/object:Gem::Requirement
|
169
|
+
requirements:
|
170
|
+
- - "~>"
|
171
|
+
- !ruby/object:Gem::Version
|
172
|
+
version: 0.10.0
|
145
173
|
- !ruby/object:Gem::Dependency
|
146
174
|
name: gem-release
|
147
175
|
requirement: !ruby/object:Gem::Requirement
|
@@ -332,6 +360,7 @@ files:
|
|
332
360
|
- lib/search_solr_tools/helpers/solr_format.rb
|
333
361
|
- lib/search_solr_tools/helpers/translate_spatial_coverage.rb
|
334
362
|
- lib/search_solr_tools/helpers/translate_temporal_coverage.rb
|
363
|
+
- lib/search_solr_tools/logging/sst_logger.rb
|
335
364
|
- lib/search_solr_tools/translators/nsidc_json.rb
|
336
365
|
- lib/search_solr_tools/version.rb
|
337
366
|
- search_solr_tools.gemspec
|
@@ -354,7 +383,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
354
383
|
- !ruby/object:Gem::Version
|
355
384
|
version: '0'
|
356
385
|
requirements: []
|
357
|
-
rubygems_version: 3.4.
|
386
|
+
rubygems_version: 3.4.10
|
358
387
|
signing_key:
|
359
388
|
specification_version: 4
|
360
389
|
summary: Tools to harvest and manage various scientific dataset feeds in a Solr instance.
|