search_solr_tools 7.1.0 → 7.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -3
- data/README.md +18 -19
- data/lib/search_solr_tools/harvesters/nsidc_json.rb +1 -1
- data/lib/search_solr_tools/helpers/solr_format.rb +7 -4
- data/lib/search_solr_tools/translators/nsidc_json.rb +2 -1
- data/lib/search_solr_tools/version.rb +1 -1
- data/search_solr_tools.gemspec +1 -1
- metadata +5 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ea0e97fdb6cd3986683da13d8fd8ff26a150b00a3fc3b64de54fee93d3ceaf1d
|
4
|
+
data.tar.gz: 9e0c10993ab2fce13b400f9fb419788f0706b4367d9629145e425ed8c5931d05
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9fa45abe863da34e10f81183d242db3ccf8e1b37b01b2efb0836f469f50ae1f36dc3cd03ca6ab27a0ea2d0a83db232b93bb98271b8948b3283f717f2fd03ee66
|
7
|
+
data.tar.gz: 190b02a7ecd6ed44c762317d4e142e88877f053ad09457187e05c5a9e6de7e41f307118ad64661cc8efe563d0bb23054d795c3aa0a307b3b71df4d764c30fd53
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,8 @@
|
|
1
|
+
## v7.2.0 (2023-10-19)
|
2
|
+
|
3
|
+
- Force parameter facets based on GCMD keywords to be upper-case.
|
4
|
+
- Only use short name for sensor facets in which the short name and long name are identical.
|
5
|
+
|
1
6
|
## v7.1.0 (2023-10-11)
|
2
7
|
|
3
8
|
- Updating harvesting to harvest storage system and spatial coverage
|
@@ -50,15 +55,15 @@
|
|
50
55
|
|
51
56
|
## v5.2.0 (2022-08-31)
|
52
57
|
|
53
|
-
- Updated the call for identifiers for the json harvester to use the
|
58
|
+
- Updated the call for identifiers for the json harvester to use the
|
54
59
|
proper "metadataPrefix" parameter, and request the dif identifiers
|
55
60
|
instead of iso.
|
56
61
|
|
57
62
|
## v5.1.0 (2020-07-23)
|
58
63
|
|
59
|
-
- Added a CLI method to "ping" the Solr and Source servers for a given
|
64
|
+
- Added a CLI method to "ping" the Solr and Source servers for a given
|
60
65
|
data center.
|
61
|
-
- Added a CLI method "errcode" to get information about the various
|
66
|
+
- Added a CLI method "errcode" to get information about the various
|
62
67
|
error codes that may be returned during harvest
|
63
68
|
- Updated the CLI harvest to return more useful error codes on failure.
|
64
69
|
|
data/README.md
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
|
5
5
|
This is a gem that contains:
|
6
6
|
|
7
|
-
* Ruby translators to transform
|
7
|
+
* Ruby translators to transform NSIDC metadata feeds into solr documents
|
8
8
|
* A command-line utility to access/utilize the gem's translators to harvest
|
9
9
|
metadata into a working solr instance.
|
10
10
|
|
@@ -25,30 +25,23 @@ Clone the repository, and install all requirements as noted below.
|
|
25
25
|
#### Configuration
|
26
26
|
|
27
27
|
Once you have the code and requirements, edit the configuration file in
|
28
|
-
`lib/search_solr_tools/config/environments.yaml` to match your environment.
|
29
|
-
|
30
|
-
|
31
|
-
different setting is specified for a given environment.
|
32
|
-
|
33
|
-
Each harvester has its own configuration settings. Most are the target endpoint;
|
34
|
-
EOL, however, has a list of THREDDS project endpoints and NSIDC has its own
|
35
|
-
oai/metadata endpoint settings.
|
36
|
-
|
37
|
-
Most users should not need to change the harvester configuration unless they
|
38
|
-
establish a local test node, or if a provider changes available endpoints;
|
39
|
-
however, the `host` option for each environment must specify the configured SOLR
|
28
|
+
`lib/search_solr_tools/config/environments.yaml` to match your environment.
|
29
|
+
Environment settings take precedence over `common` settings.
|
30
|
+
The `host` option for each environment must specify the configured SOLR
|
40
31
|
instance you intend to use these tools with.
|
41
32
|
|
42
33
|
#### Build and Install Gem
|
43
34
|
|
44
|
-
|
35
|
+
Run:
|
45
36
|
|
46
37
|
`bundle exec gem build ./search_solr_tools.gemspec`
|
47
38
|
|
48
|
-
Once you have the gem built in the project directory, install
|
39
|
+
Once you have the gem built in the project directory, install it:
|
49
40
|
|
50
41
|
`gem install --local ./search_solr_tools-version.gem`
|
51
42
|
|
43
|
+
See _Harvesting Data_ (below) for usage examples.
|
44
|
+
|
52
45
|
## Working on the Project
|
53
46
|
|
54
47
|
1. Create your feature branch (`git checkout -b my-new-feature`)
|
@@ -177,8 +170,7 @@ workflow instead.
|
|
177
170
|
|
178
171
|
### SOLR
|
179
172
|
|
180
|
-
To harvest data utilizing the gem, you will need an installed instance of [Solr
|
181
|
-
8.5.3](https://lucene.apache.org/solr/guide/)
|
173
|
+
To harvest data utilizing the gem, you will need an installed instance of [Solr](https://solr.apache.org/guide/solr/latest/index.html)
|
182
174
|
|
183
175
|
#### NSIDC
|
184
176
|
|
@@ -193,7 +185,7 @@ Outside of NSIDC, setup solr using the instructions found in the
|
|
193
185
|
|
194
186
|
### Harvesting Data
|
195
187
|
|
196
|
-
The harvester requires additional metadata from services that may not
|
188
|
+
The harvester requires additional metadata from services that may not be
|
197
189
|
publicly available, which are referenced in
|
198
190
|
`lib/search_solr_tools/config/environments.yaml`.
|
199
191
|
|
@@ -204,7 +196,7 @@ overview of what's available, simply run `search_solr_tools`.
|
|
204
196
|
|
205
197
|
Harvesting of data can be done using the `harvest` task, giving it a list of
|
206
198
|
harvesters and an environment. Deletion is possible via the `delete_all` and/or
|
207
|
-
`delete_by_data_center'`tasks. `
|
199
|
+
`delete_by_data_center'`tasks. `list_harvesters` will list the valid harvest
|
208
200
|
targets.
|
209
201
|
|
210
202
|
In addition to feed URLs, `environments.yaml` also defines various environments
|
@@ -212,6 +204,13 @@ which can be modified, or additional environments can be added by just adding a
|
|
212
204
|
new YAML stanza with the right keys; this new environment can then be used with
|
213
205
|
the `--environment` flag when running `search_solr_tools harvest`.
|
214
206
|
|
207
|
+
An example harvest of NSIDC metadata into a developer instance of Solr:
|
208
|
+
|
209
|
+
bundle exec search_solr_tools harvest --data-center=nsidc --environment=dev
|
210
|
+
|
211
|
+
In this example, the `host` value in the `environments.yaml` `dev` entry
|
212
|
+
must reference a valid Solr instance.
|
213
|
+
|
215
214
|
#### Logging
|
216
215
|
|
217
216
|
By default, when running the harvest, harvest logs are written to the file
|
@@ -40,7 +40,7 @@ module SearchSolrTools
|
|
40
40
|
|
41
41
|
status.record_status(Helpers::HarvestStatus::HARVEST_NO_DOCS) if (result[:num_docs]).zero?
|
42
42
|
|
43
|
-
# Record the number of harvest failures; note that if this is 0,
|
43
|
+
# Record the number of harvest failures; note that if this is 0, that's OK, the status will stay at 0
|
44
44
|
status.record_status(Helpers::HarvestStatus::HARVEST_FAILURE, result[:failure_ids].length)
|
45
45
|
|
46
46
|
raise Errors::HarvestError, status unless status.ok?
|
@@ -89,7 +89,7 @@ module SearchSolrTools
|
|
89
89
|
binned_facet = bin(FacetConfiguration.get_facet_bin(type), format_string)
|
90
90
|
if binned_facet.nil?
|
91
91
|
format_string
|
92
|
-
elsif binned_facet.
|
92
|
+
elsif binned_facet.match?(/\Aexclude\z/i)
|
93
93
|
nil
|
94
94
|
else
|
95
95
|
binned_facet
|
@@ -98,11 +98,14 @@ module SearchSolrTools
|
|
98
98
|
|
99
99
|
def self.parameter_binning(parameter_string)
|
100
100
|
binned_parameter = bin(FacetConfiguration.get_facet_bin('parameter'), parameter_string)
|
101
|
-
# use variable_level_1 if no mapping exists
|
102
101
|
return binned_parameter unless binned_parameter.nil?
|
103
102
|
|
103
|
+
# if no mapping exists, use variable_level_1.
|
104
|
+
# Force it to all upper case for consistency. This is a hacky workaround to
|
105
|
+
# deal with deprecated GCMD keywords still in use by some datasets that result
|
106
|
+
# in duplicate, case-sensitive entries in the search interface facet list.
|
104
107
|
parts = parameter_string.split '>'
|
105
|
-
return parts[3].strip if parts.length >= 4
|
108
|
+
return parts[3].strip.upcase if parts.length >= 4
|
106
109
|
|
107
110
|
nil
|
108
111
|
end
|
@@ -158,7 +161,7 @@ module SearchSolrTools
|
|
158
161
|
def self.bin(mappings, term)
|
159
162
|
mappings.each do |mapping|
|
160
163
|
term.match(mapping['pattern']) do
|
161
|
-
return mapping['mapping']
|
164
|
+
return mapping['mapping'].upcase
|
162
165
|
end
|
163
166
|
end
|
164
167
|
nil
|
@@ -75,9 +75,10 @@ module SearchSolrTools
|
|
75
75
|
return facet_values if json.nil?
|
76
76
|
|
77
77
|
json.each do |json_entry|
|
78
|
+
long_name = json_entry['shortName'].eql?(json_entry['longName']) ? '' : json_entry['longName']
|
78
79
|
sensor_bin = Helpers::SolrFormat.facet_binning('sensor', json_entry['shortName'].to_s)
|
79
80
|
facet_values << if sensor_bin.eql? json_entry['shortName']
|
80
|
-
"#{
|
81
|
+
"#{long_name} | #{json_entry['shortName']}"
|
81
82
|
else
|
82
83
|
" | #{sensor_bin}"
|
83
84
|
end
|
data/search_solr_tools.gemspec
CHANGED
@@ -8,7 +8,7 @@ Gem::Specification.new do |spec|
|
|
8
8
|
spec.name = 'search_solr_tools'
|
9
9
|
spec.version = SearchSolrTools::VERSION
|
10
10
|
spec.authors = ['Chris Chalstrom', 'Michael Brandt', 'Jonathan Kovarik', 'Luis Lopez', 'Stuart Reed', 'Julia Collins', 'Scott Lewis']
|
11
|
-
spec.email = ['
|
11
|
+
spec.email = ['Jonathan.Kovarik@colorado.edu', 'luis.lopezespinosa@colorado.edu', 'collinsj@colorado.edu', 'scott.lewis@colorado.edu']
|
12
12
|
spec.summary = 'Tools to harvest and manage various scientific dataset feeds in a Solr instance.'
|
13
13
|
spec.description = <<-EOF
|
14
14
|
Ruby translators to transform various metadata feeds into solr documents and
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: search_solr_tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 7.
|
4
|
+
version: 7.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Chris Chalstrom
|
@@ -14,7 +14,7 @@ authors:
|
|
14
14
|
autorequire:
|
15
15
|
bindir: bin
|
16
16
|
cert_chain: []
|
17
|
-
date: 2023-10-
|
17
|
+
date: 2023-10-19 00:00:00.000000000 Z
|
18
18
|
dependencies:
|
19
19
|
- !ruby/object:Gem::Dependency
|
20
20
|
name: ffi-geos
|
@@ -329,13 +329,10 @@ description: |2
|
|
329
329
|
a command-line utility to access/utilize the gem's translators to harvest
|
330
330
|
metadata into a working solr instance.
|
331
331
|
email:
|
332
|
-
-
|
333
|
-
- mbrandt@colorado.edu
|
334
|
-
- kovarik@nsidc.org
|
332
|
+
- Jonathan.Kovarik@colorado.edu
|
335
333
|
- luis.lopezespinosa@colorado.edu
|
336
|
-
-
|
337
|
-
-
|
338
|
-
- scott.lewis@nsidc.org
|
334
|
+
- collinsj@colorado.edu
|
335
|
+
- scott.lewis@colorado.edu
|
339
336
|
executables:
|
340
337
|
- search_solr_tools
|
341
338
|
extensions: []
|