search_solr_tools 7.1.0 → 7.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -3
- data/README.md +18 -19
- data/lib/search_solr_tools/harvesters/nsidc_json.rb +1 -1
- data/lib/search_solr_tools/helpers/solr_format.rb +7 -4
- data/lib/search_solr_tools/translators/nsidc_json.rb +2 -1
- data/lib/search_solr_tools/version.rb +1 -1
- data/search_solr_tools.gemspec +1 -1
- metadata +5 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ea0e97fdb6cd3986683da13d8fd8ff26a150b00a3fc3b64de54fee93d3ceaf1d
|
4
|
+
data.tar.gz: 9e0c10993ab2fce13b400f9fb419788f0706b4367d9629145e425ed8c5931d05
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9fa45abe863da34e10f81183d242db3ccf8e1b37b01b2efb0836f469f50ae1f36dc3cd03ca6ab27a0ea2d0a83db232b93bb98271b8948b3283f717f2fd03ee66
|
7
|
+
data.tar.gz: 190b02a7ecd6ed44c762317d4e142e88877f053ad09457187e05c5a9e6de7e41f307118ad64661cc8efe563d0bb23054d795c3aa0a307b3b71df4d764c30fd53
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,8 @@
|
|
1
|
+
## v7.2.0 (2023-10-19)
|
2
|
+
|
3
|
+
- Force parameter facets based on GCMD keywords to be upper-case.
|
4
|
+
- Only use short name for sensor facets in which the short name and long name are identical.
|
5
|
+
|
1
6
|
## v7.1.0 (2023-10-11)
|
2
7
|
|
3
8
|
- Updating harvesting to harvest storage system and spatial coverage
|
@@ -50,15 +55,15 @@
|
|
50
55
|
|
51
56
|
## v5.2.0 (2022-08-31)
|
52
57
|
|
53
|
-
- Updated the call for identifiers for the json harvester to use the
|
58
|
+
- Updated the call for identifiers for the json harvester to use the
|
54
59
|
proper "metadataPrefix" parameter, and request the dif identifiers
|
55
60
|
instead of iso.
|
56
61
|
|
57
62
|
## v5.1.0 (2020-07-23)
|
58
63
|
|
59
|
-
- Added a CLI method to "ping" the Solr and Source servers for a given
|
64
|
+
- Added a CLI method to "ping" the Solr and Source servers for a given
|
60
65
|
data center.
|
61
|
-
- Added a CLI method "errcode" to get information about the various
|
66
|
+
- Added a CLI method "errcode" to get information about the various
|
62
67
|
error codes that may be returned during harvest
|
63
68
|
- Updated the CLI harvest to return more useful error codes on failure.
|
64
69
|
|
data/README.md
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
|
5
5
|
This is a gem that contains:
|
6
6
|
|
7
|
-
* Ruby translators to transform
|
7
|
+
* Ruby translators to transform NSIDC metadata feeds into solr documents
|
8
8
|
* A command-line utility to access/utilize the gem's translators to harvest
|
9
9
|
metadata into a working solr instance.
|
10
10
|
|
@@ -25,30 +25,23 @@ Clone the repository, and install all requirements as noted below.
|
|
25
25
|
#### Configuration
|
26
26
|
|
27
27
|
Once you have the code and requirements, edit the configuration file in
|
28
|
-
`lib/search_solr_tools/config/environments.yaml` to match your environment.
|
29
|
-
|
30
|
-
|
31
|
-
different setting is specified for a given environment.
|
32
|
-
|
33
|
-
Each harvester has its own configuration settings. Most are the target endpoint;
|
34
|
-
EOL, however, has a list of THREDDS project endpoints and NSIDC has its own
|
35
|
-
oai/metadata endpoint settings.
|
36
|
-
|
37
|
-
Most users should not need to change the harvester configuration unless they
|
38
|
-
establish a local test node, or if a provider changes available endpoints;
|
39
|
-
however, the `host` option for each environment must specify the configured SOLR
|
28
|
+
`lib/search_solr_tools/config/environments.yaml` to match your environment.
|
29
|
+
Environment settings take precedence over `common` settings.
|
30
|
+
The `host` option for each environment must specify the configured SOLR
|
40
31
|
instance you intend to use these tools with.
|
41
32
|
|
42
33
|
#### Build and Install Gem
|
43
34
|
|
44
|
-
|
35
|
+
Run:
|
45
36
|
|
46
37
|
`bundle exec gem build ./search_solr_tools.gemspec`
|
47
38
|
|
48
|
-
Once you have the gem built in the project directory, install
|
39
|
+
Once you have the gem built in the project directory, install it:
|
49
40
|
|
50
41
|
`gem install --local ./search_solr_tools-version.gem`
|
51
42
|
|
43
|
+
See _Harvesting Data_ (below) for usage examples.
|
44
|
+
|
52
45
|
## Working on the Project
|
53
46
|
|
54
47
|
1. Create your feature branch (`git checkout -b my-new-feature`)
|
@@ -177,8 +170,7 @@ workflow instead.
|
|
177
170
|
|
178
171
|
### SOLR
|
179
172
|
|
180
|
-
To harvest data utilizing the gem, you will need an installed instance of [Solr
|
181
|
-
8.5.3](https://lucene.apache.org/solr/guide/)
|
173
|
+
To harvest data utilizing the gem, you will need an installed instance of [Solr](https://solr.apache.org/guide/solr/latest/index.html)
|
182
174
|
|
183
175
|
#### NSIDC
|
184
176
|
|
@@ -193,7 +185,7 @@ Outside of NSIDC, setup solr using the instructions found in the
|
|
193
185
|
|
194
186
|
### Harvesting Data
|
195
187
|
|
196
|
-
The harvester requires additional metadata from services that may not
|
188
|
+
The harvester requires additional metadata from services that may not be
|
197
189
|
publicly available, which are referenced in
|
198
190
|
`lib/search_solr_tools/config/environments.yaml`.
|
199
191
|
|
@@ -204,7 +196,7 @@ overview of what's available, simply run `search_solr_tools`.
|
|
204
196
|
|
205
197
|
Harvesting of data can be done using the `harvest` task, giving it a list of
|
206
198
|
harvesters and an environment. Deletion is possible via the `delete_all` and/or
|
207
|
-
`delete_by_data_center'`tasks. `
|
199
|
+
`delete_by_data_center'`tasks. `list_harvesters` will list the valid harvest
|
208
200
|
targets.
|
209
201
|
|
210
202
|
In addition to feed URLs, `environments.yaml` also defines various environments
|
@@ -212,6 +204,13 @@ which can be modified, or additional environments can be added by just adding a
|
|
212
204
|
new YAML stanza with the right keys; this new environment can then be used with
|
213
205
|
the `--environment` flag when running `search_solr_tools harvest`.
|
214
206
|
|
207
|
+
An example harvest of NSIDC metadata into a developer instance of Solr:
|
208
|
+
|
209
|
+
bundle exec search_solr_tools harvest --data-center=nsidc --environment=dev
|
210
|
+
|
211
|
+
In this example, the `host` value in the `environments.yaml` `dev` entry
|
212
|
+
must reference a valid Solr instance.
|
213
|
+
|
215
214
|
#### Logging
|
216
215
|
|
217
216
|
By default, when running the harvest, harvest logs are written to the file
|
@@ -40,7 +40,7 @@ module SearchSolrTools
|
|
40
40
|
|
41
41
|
status.record_status(Helpers::HarvestStatus::HARVEST_NO_DOCS) if (result[:num_docs]).zero?
|
42
42
|
|
43
|
-
# Record the number of harvest failures; note that if this is 0,
|
43
|
+
# Record the number of harvest failures; note that if this is 0, that's OK, the status will stay at 0
|
44
44
|
status.record_status(Helpers::HarvestStatus::HARVEST_FAILURE, result[:failure_ids].length)
|
45
45
|
|
46
46
|
raise Errors::HarvestError, status unless status.ok?
|
@@ -89,7 +89,7 @@ module SearchSolrTools
|
|
89
89
|
binned_facet = bin(FacetConfiguration.get_facet_bin(type), format_string)
|
90
90
|
if binned_facet.nil?
|
91
91
|
format_string
|
92
|
-
elsif binned_facet.
|
92
|
+
elsif binned_facet.match?(/\Aexclude\z/i)
|
93
93
|
nil
|
94
94
|
else
|
95
95
|
binned_facet
|
@@ -98,11 +98,14 @@ module SearchSolrTools
|
|
98
98
|
|
99
99
|
def self.parameter_binning(parameter_string)
|
100
100
|
binned_parameter = bin(FacetConfiguration.get_facet_bin('parameter'), parameter_string)
|
101
|
-
# use variable_level_1 if no mapping exists
|
102
101
|
return binned_parameter unless binned_parameter.nil?
|
103
102
|
|
103
|
+
# if no mapping exists, use variable_level_1.
|
104
|
+
# Force it to all upper case for consistency. This is a hacky workaround to
|
105
|
+
# deal with deprecated GCMD keywords still in use by some datasets that result
|
106
|
+
# in duplicate, case-sensitive entries in the search interface facet list.
|
104
107
|
parts = parameter_string.split '>'
|
105
|
-
return parts[3].strip if parts.length >= 4
|
108
|
+
return parts[3].strip.upcase if parts.length >= 4
|
106
109
|
|
107
110
|
nil
|
108
111
|
end
|
@@ -158,7 +161,7 @@ module SearchSolrTools
|
|
158
161
|
def self.bin(mappings, term)
|
159
162
|
mappings.each do |mapping|
|
160
163
|
term.match(mapping['pattern']) do
|
161
|
-
return mapping['mapping']
|
164
|
+
return mapping['mapping'].upcase
|
162
165
|
end
|
163
166
|
end
|
164
167
|
nil
|
@@ -75,9 +75,10 @@ module SearchSolrTools
|
|
75
75
|
return facet_values if json.nil?
|
76
76
|
|
77
77
|
json.each do |json_entry|
|
78
|
+
long_name = json_entry['shortName'].eql?(json_entry['longName']) ? '' : json_entry['longName']
|
78
79
|
sensor_bin = Helpers::SolrFormat.facet_binning('sensor', json_entry['shortName'].to_s)
|
79
80
|
facet_values << if sensor_bin.eql? json_entry['shortName']
|
80
|
-
"#{
|
81
|
+
"#{long_name} | #{json_entry['shortName']}"
|
81
82
|
else
|
82
83
|
" | #{sensor_bin}"
|
83
84
|
end
|
data/search_solr_tools.gemspec
CHANGED
@@ -8,7 +8,7 @@ Gem::Specification.new do |spec|
|
|
8
8
|
spec.name = 'search_solr_tools'
|
9
9
|
spec.version = SearchSolrTools::VERSION
|
10
10
|
spec.authors = ['Chris Chalstrom', 'Michael Brandt', 'Jonathan Kovarik', 'Luis Lopez', 'Stuart Reed', 'Julia Collins', 'Scott Lewis']
|
11
|
-
spec.email = ['
|
11
|
+
spec.email = ['Jonathan.Kovarik@colorado.edu', 'luis.lopezespinosa@colorado.edu', 'collinsj@colorado.edu', 'scott.lewis@colorado.edu']
|
12
12
|
spec.summary = 'Tools to harvest and manage various scientific dataset feeds in a Solr instance.'
|
13
13
|
spec.description = <<-EOF
|
14
14
|
Ruby translators to transform various metadata feeds into solr documents and
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: search_solr_tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 7.
|
4
|
+
version: 7.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Chris Chalstrom
|
@@ -14,7 +14,7 @@ authors:
|
|
14
14
|
autorequire:
|
15
15
|
bindir: bin
|
16
16
|
cert_chain: []
|
17
|
-
date: 2023-10-
|
17
|
+
date: 2023-10-19 00:00:00.000000000 Z
|
18
18
|
dependencies:
|
19
19
|
- !ruby/object:Gem::Dependency
|
20
20
|
name: ffi-geos
|
@@ -329,13 +329,10 @@ description: |2
|
|
329
329
|
a command-line utility to access/utilize the gem's translators to harvest
|
330
330
|
metadata into a working solr instance.
|
331
331
|
email:
|
332
|
-
-
|
333
|
-
- mbrandt@colorado.edu
|
334
|
-
- kovarik@nsidc.org
|
332
|
+
- Jonathan.Kovarik@colorado.edu
|
335
333
|
- luis.lopezespinosa@colorado.edu
|
336
|
-
-
|
337
|
-
-
|
338
|
-
- scott.lewis@nsidc.org
|
334
|
+
- collinsj@colorado.edu
|
335
|
+
- scott.lewis@colorado.edu
|
339
336
|
executables:
|
340
337
|
- search_solr_tools
|
341
338
|
extensions: []
|