solrizer 3.4.1 → 4.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CONTRIBUTING.md +66 -20
- data/README.md +1 -75
- data/lib/solrizer.rb +0 -3
- data/lib/solrizer/field_mapper.rb +1 -1
- data/lib/solrizer/version.rb +1 -1
- data/solrizer.gemspec +0 -2
- metadata +3 -44
- data/bin/solrizer +0 -107
- data/bin/solrizerd +0 -68
- data/lib/solrizer/extractor.rb +0 -68
- data/lib/solrizer/html.rb +0 -7
- data/lib/solrizer/html/extractor.rb +0 -36
- data/lib/solrizer/xml.rb +0 -5
- data/lib/solrizer/xml/extractor.rb +0 -32
- data/spec/units/extractor_spec.rb +0 -44
- data/spec/units/xml_extractor_spec.rb +0 -26
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d9aa32a4193f5ad11975f40824475634bf149efb
|
4
|
+
data.tar.gz: 6300a897b884f28fd8a335f36e6a2ee214df4d16
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 955481c893fbd628a1c61a088465f40d07978647eb5d711f1ad7ff63711da6301a4a735dd7a770d2d95e63a93dc430a509d715cdeddd7a1c4b1a57131a2ed8bd
|
7
|
+
data.tar.gz: c7eb036b3a788b8de7a764c077f60ef7e74ae90ce70335f613463920852cc47f5adb6541d2a304fc509c0000619f1d2601924a6fa4a314275d4ba186323952f0
|
data/CONTRIBUTING.md
CHANGED
@@ -3,6 +3,13 @@
|
|
3
3
|
We want your help to make Project Hydra great.
|
4
4
|
There are a few guidelines that we need contributors to follow so that we can have a chance of keeping on top of things.
|
5
5
|
|
6
|
+
## Code of Conduct
|
7
|
+
|
8
|
+
The Hydra community is dedicated to providing a welcoming and positive experience for all its
|
9
|
+
members, whether they are at a formal gathering, in a social setting, or taking part in activities
|
10
|
+
online. Please see our [Code of Conduct](https://wiki.duraspace.org/display/hydra/Code+of+Conduct)
|
11
|
+
for more information.
|
12
|
+
|
6
13
|
## Hydra Project Intellectual Property Licensing and Ownership
|
7
14
|
|
8
15
|
All code contributors must have an Individual Contributor License Agreement (iCLA) on file with the Hydra Project Steering Group.
|
@@ -16,8 +23,10 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
|
|
16
23
|
|
17
24
|
* Reporting Issues
|
18
25
|
* Making Changes
|
26
|
+
* Documenting Code
|
27
|
+
* Committing Changes
|
19
28
|
* Submitting Changes
|
20
|
-
* Merging Changes
|
29
|
+
* Reviewing and Merging Changes
|
21
30
|
|
22
31
|
### Reporting Issues
|
23
32
|
|
@@ -38,8 +47,28 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
|
|
38
47
|
* Then checkout the new branch with `git checkout fix/master/my_contribution`.
|
39
48
|
* Please avoid working directly on the `master` branch.
|
40
49
|
* You may find the [hub suite of commands](https://github.com/defunkt/hub) helpful
|
50
|
+
* Make sure you have added sufficient tests and documentation for your changes.
|
51
|
+
* Test functionality with RSpec; est features / UI with Capybara.
|
52
|
+
* Run _all_ the tests to assure nothing else was accidentally broken.
|
53
|
+
|
54
|
+
### Documenting Code
|
55
|
+
|
56
|
+
* All new public methods, modules, and classes should include inline documentation in [YARD](http://yardoc.org/).
|
57
|
+
* Documentation should seek to answer the question "why does this code exist?"
|
58
|
+
* Document private / protected methods as desired.
|
59
|
+
* If you are working in a file with no prior documentation, do try to document as you gain understanding of the code.
|
60
|
+
* If you don't know exactly what a bit of code does, it is extra likely that it needs to be documented. Take a stab at it and ask for feedback in your pull request. You can use the 'blame' button on GitHub to identify the original developer of the code and @mention them in your comment.
|
61
|
+
* This work greatly increases the usability of the code base and supports the on-ramping of new committers.
|
62
|
+
* We will all be understanding of one another's time constraints in this area.
|
63
|
+
* YARD examples:
|
64
|
+
* [Hydra::Works::RemoveGenericFile](https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/services/generic_work/remove_generic_file.rb)
|
65
|
+
* [ActiveTriples::LocalName::Minter](https://github.com/ActiveTriples/active_triples-local_name/blob/master/lib/active_triples/local_name/minter.rb)
|
66
|
+
* [Getting started with YARD](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md)
|
67
|
+
|
68
|
+
### Committing changes
|
69
|
+
|
41
70
|
* Make commits of logical units.
|
42
|
-
* Your commit should include a high level description of your work in HISTORY.textile
|
71
|
+
* Your commit should include a high level description of your work in HISTORY.textile
|
43
72
|
* Check for unnecessary whitespace with `git diff --check` before committing.
|
44
73
|
* Make sure your commit messages are [well formed](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
|
45
74
|
* If you created an issue, you can close it by including "Closes #issue" in your commit message. See [Github's blog post for more details](https://github.com/blog/1386-closing-issues-via-commit-messages)
|
@@ -60,7 +89,9 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
|
|
60
89
|
|
61
90
|
class PostsController
|
62
91
|
def index
|
63
|
-
|
92
|
+
respond_to do |wants|
|
93
|
+
wants.html { render 'index' }
|
94
|
+
end
|
64
95
|
end
|
65
96
|
end
|
66
97
|
|
@@ -72,38 +103,53 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
|
|
72
103
|
long to fit in 72 characters
|
73
104
|
```
|
74
105
|
|
75
|
-
* Make sure you have added the necessary tests for your changes.
|
76
|
-
* Run _all_ the tests to assure nothing else was accidentally broken.
|
77
|
-
* When you are ready to submit a pull request
|
78
|
-
|
79
106
|
### Submitting Changes
|
80
107
|
|
81
|
-
[Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
|
82
|
-
|
83
108
|
* Read the article ["Using Pull Requests"](https://help.github.com/articles/using-pull-requests) on GitHub.
|
84
109
|
* Make sure your branch is up to date with its parent branch (i.e. master)
|
85
110
|
* `git checkout master`
|
86
111
|
* `git pull --rebase`
|
87
112
|
* `git checkout <your-branch>`
|
88
113
|
* `git rebase master`
|
89
|
-
* It is
|
90
|
-
*
|
91
|
-
*
|
92
|
-
*
|
114
|
+
* It is a good idea to run your tests again.
|
115
|
+
* If you've made more than one commit take a moment to consider whether squashing commits together would help improve their logical grouping.
|
116
|
+
* [Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
|
117
|
+
* `git rebase --interactive master` ([See Github help](https://help.github.com/articles/interactive-rebase))
|
93
118
|
* Squashing your branch's changes into one commit is "good form" and helps the person merging your request to see everything that is going on.
|
94
119
|
* Push your changes to a topic branch in your fork of the repository.
|
95
120
|
* Submit a pull request from your fork to the project.
|
96
121
|
|
97
|
-
### Merging Changes
|
122
|
+
### Reviewing and Merging Changes
|
123
|
+
|
124
|
+
We adopted [Github's Pull Request Review](https://help.github.com/articles/about-pull-request-reviews/) for our repositories.
|
125
|
+
Common checks that may occur in our repositories:
|
126
|
+
|
127
|
+
1. Travis CI - where our automated tests are running
|
128
|
+
2. Hound CI - where we check for style violations
|
129
|
+
3. Approval Required - Github enforces at least one person approve a pull request. Also, all reviewers that have chimed in must approve.
|
130
|
+
4. CodeClimate - is our code remaining healthy (at least according to static code analysis)
|
131
|
+
|
132
|
+
If one or more of the required checks failed (or are incomplete), the code should not be merged (and the UI will not allow it). If all of the checks have passed, then anyone on the project (including the pull request submitter) may merge the code.
|
133
|
+
|
134
|
+
*Example: Carolyn submits a pull request, Justin reviews the pull request and approves. However, Justin is still waiting on other checks (Travis CI is usually the culprit), so he does not merge the pull request. Eventually, all of the checks pass. At this point, Carolyn or anyone else may merge the pull request.*
|
135
|
+
|
136
|
+
#### Things to Consider When Reviewing
|
137
|
+
|
138
|
+
First, the person contributing the code is putting themselves out there. Be mindful of what you say in a review.
|
139
|
+
|
140
|
+
* Ask clarifying questions
|
141
|
+
* State your understanding and expectations
|
142
|
+
* Provide example code or alternate solutions, and explain why
|
143
|
+
|
144
|
+
This is your chance for a mentoring moment of another developer. Take time to give an honest and thorough review of what has changed. Things to consider:
|
98
145
|
|
99
|
-
* It is considered "poor from" to merge your own request.
|
100
|
-
* Please take the time to review the changes and get a sense of what is being changed. Things to consider:
|
101
146
|
* Does the commit message explain what is going on?
|
102
|
-
* Does the code changes have tests? _Not all changes need new tests, some changes are
|
147
|
+
* Does the code changes have tests? _Not all changes need new tests, some changes are refactors_
|
148
|
+
* Do new or changed methods, modules, and classes have documentation?
|
103
149
|
* Does the commit contain more than it should? Are two separate concerns being addressed in one commit?
|
104
|
-
*
|
105
|
-
|
106
|
-
|
150
|
+
* Does the description of the new/changed specs match your understanding of what the spec is doing?
|
151
|
+
|
152
|
+
If you are uncertain, bring other contributors into the conversation by assigning them as a reviewer.
|
107
153
|
|
108
154
|
# Additional Resources
|
109
155
|
|
data/README.md
CHANGED
@@ -3,13 +3,7 @@
|
|
3
3
|
[![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
|
4
4
|
[![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)
|
5
5
|
|
6
|
-
A lightweight
|
7
|
-
the command line, or as a JMS listener.
|
8
|
-
|
9
|
-
Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
|
10
|
-
data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
|
11
|
-
"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
|
12
|
-
fedora repository and writing to a solr instance.
|
6
|
+
A lightweight tool for creating dynamic solr schema sufixes.
|
13
7
|
|
14
8
|
|
15
9
|
## Installation
|
@@ -157,74 +151,6 @@ But now you may also pass an Descriptor instance if that works for you:
|
|
157
151
|
indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
158
152
|
t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
|
159
153
|
|
160
|
-
### Extractor and Extractor Mixins
|
161
|
-
|
162
|
-
Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
|
163
|
-
|
164
|
-
> extractor = Solrizer::Extractor.new
|
165
|
-
> solr_doc = Hash.new
|
166
|
-
> extractor.format_node_value(["foo ","\n bar"])
|
167
|
-
=> "foo bar"
|
168
|
-
> extractor.insert_solr_field_value(solr_doc, "foo","bar")
|
169
|
-
=> {"foo"=>"bar"}
|
170
|
-
> extractor.insert_solr_field_value(solr_doc,"foo","baz")
|
171
|
-
=> {"foo"=>["bar", "baz"]}
|
172
|
-
> extractor.insert_solr_field_value(solr_doc, "boo","hoo")
|
173
|
-
=> {"foo"=>["bar", "baz"], "boo"=>"hoo"}
|
174
|
-
|
175
|
-
#### Solrizer provides some default mixins:
|
176
|
-
|
177
|
-
`Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
|
178
|
-
|
179
|
-
> Solrizer::XML::Extractor
|
180
|
-
> extractor = Solrizer::Extractor.new
|
181
|
-
> xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
|
182
|
-
> extractor.xml_to_solr(xml)
|
183
|
-
=> {:foo_tesim=>"bar", :bar_tesim=>"baz"}
|
184
|
-
|
185
|
-
#### Solrizer::XML::TerminologyBasedSolrizer
|
186
|
-
|
187
|
-
Another powerful mixin for use with classes that include the `OM::XML::Document` module is
|
188
|
-
`Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
|
189
|
-
terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
|
190
|
-
|
191
|
-
## JMS Listener for Hydra Rails Applications
|
192
|
-
|
193
|
-
### The executables: solrizer and solrizerd
|
194
|
-
|
195
|
-
The solrizer gem provides two executables:
|
196
|
-
|
197
|
-
* solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
|
198
|
-
* solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
|
199
|
-
|
200
|
-
### Usage
|
201
|
-
|
202
|
-
The usage for solrizerd is as follows:
|
203
|
-
|
204
|
-
solrizerd command --hydra_home PATH [options]
|
205
|
-
|
206
|
-
The commands are as follows:
|
207
|
-
* start start an instance of the application
|
208
|
-
* stop stop all instances of the application
|
209
|
-
* restart stop all instances and restart them afterwards
|
210
|
-
* status show status (PID) of application instances
|
211
|
-
|
212
|
-
Required parameters:
|
213
|
-
|
214
|
-
--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
|
215
|
-
|
216
|
-
The options:
|
217
|
-
* -p, --port Stomp port 61613
|
218
|
-
* -o, --host Host to connect to localhost
|
219
|
-
* -u, --user User name for stomp listener
|
220
|
-
* -w, --password Password for stomp listener
|
221
|
-
* -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
|
222
|
-
* -h, --help Display this screen
|
223
|
-
|
224
|
-
Note:
|
225
|
-
|
226
|
-
Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
|
227
|
-
|
228
154
|
## Note on Patches/Pull Requests
|
229
155
|
|
230
156
|
* Fork the project.
|
data/lib/solrizer.rb
CHANGED
@@ -5,14 +5,11 @@ module Solrizer
|
|
5
5
|
extend ActiveSupport::Autoload
|
6
6
|
|
7
7
|
autoload :Common
|
8
|
-
autoload :Extractor
|
9
8
|
autoload :Descriptor
|
10
9
|
autoload :FieldMapper
|
11
10
|
autoload :DefaultDescriptors
|
12
11
|
autoload :Suffix
|
13
|
-
autoload :HTML, 'solrizer/html'
|
14
12
|
autoload :VERSION, 'solrizer/version'
|
15
|
-
autoload :XML, 'solrizer/xml'
|
16
13
|
|
17
14
|
mattr_accessor :logger, instance_writer: false
|
18
15
|
|
data/lib/solrizer/version.rb
CHANGED
data/solrizer.gemspec
CHANGED
@@ -14,8 +14,6 @@ Gem::Specification.new do |s|
|
|
14
14
|
|
15
15
|
s.add_dependency "nokogiri"
|
16
16
|
s.add_dependency "xml-simple"
|
17
|
-
s.add_dependency "stomp"
|
18
|
-
s.add_dependency "daemons"
|
19
17
|
s.add_dependency "activesupport"
|
20
18
|
s.add_development_dependency 'rspec', '~> 3.5'
|
21
19
|
s.add_development_dependency 'rake'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: solrizer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 4.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Matt Zumwalt
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-01-
|
11
|
+
date: 2017-01-26 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -38,34 +38,6 @@ dependencies:
|
|
38
38
|
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '0'
|
41
|
-
- !ruby/object:Gem::Dependency
|
42
|
-
name: stomp
|
43
|
-
requirement: !ruby/object:Gem::Requirement
|
44
|
-
requirements:
|
45
|
-
- - ">="
|
46
|
-
- !ruby/object:Gem::Version
|
47
|
-
version: '0'
|
48
|
-
type: :runtime
|
49
|
-
prerelease: false
|
50
|
-
version_requirements: !ruby/object:Gem::Requirement
|
51
|
-
requirements:
|
52
|
-
- - ">="
|
53
|
-
- !ruby/object:Gem::Version
|
54
|
-
version: '0'
|
55
|
-
- !ruby/object:Gem::Dependency
|
56
|
-
name: daemons
|
57
|
-
requirement: !ruby/object:Gem::Requirement
|
58
|
-
requirements:
|
59
|
-
- - ">="
|
60
|
-
- !ruby/object:Gem::Version
|
61
|
-
version: '0'
|
62
|
-
type: :runtime
|
63
|
-
prerelease: false
|
64
|
-
version_requirements: !ruby/object:Gem::Requirement
|
65
|
-
requirements:
|
66
|
-
- - ">="
|
67
|
-
- !ruby/object:Gem::Version
|
68
|
-
version: '0'
|
69
41
|
- !ruby/object:Gem::Dependency
|
70
42
|
name: activesupport
|
71
43
|
requirement: !ruby/object:Gem::Requirement
|
@@ -139,9 +111,7 @@ dependencies:
|
|
139
111
|
description: Use solrizer to populate solr indexes. You can run solrizer from within
|
140
112
|
your app, using the provided rake tasks, or as a JMS listener
|
141
113
|
email: hydra-tech@googlegroups.com
|
142
|
-
executables:
|
143
|
-
- solrizer
|
144
|
-
- solrizerd
|
114
|
+
executables: []
|
145
115
|
extensions: []
|
146
116
|
extra_rdoc_files:
|
147
117
|
- LICENSE
|
@@ -155,31 +125,22 @@ files:
|
|
155
125
|
- LICENSE
|
156
126
|
- README.md
|
157
127
|
- Rakefile
|
158
|
-
- bin/solrizer
|
159
|
-
- bin/solrizerd
|
160
128
|
- lib/solrizer.rb
|
161
129
|
- lib/solrizer/common.rb
|
162
130
|
- lib/solrizer/default_descriptors.rb
|
163
131
|
- lib/solrizer/descriptor.rb
|
164
|
-
- lib/solrizer/extractor.rb
|
165
132
|
- lib/solrizer/field_mapper.rb
|
166
|
-
- lib/solrizer/html.rb
|
167
|
-
- lib/solrizer/html/extractor.rb
|
168
133
|
- lib/solrizer/suffix.rb
|
169
134
|
- lib/solrizer/version.rb
|
170
|
-
- lib/solrizer/xml.rb
|
171
|
-
- lib/solrizer/xml/extractor.rb
|
172
135
|
- lib/tasks/solrizer.rake
|
173
136
|
- solrizer.gemspec
|
174
137
|
- spec/.rspec
|
175
138
|
- spec/fixtures/druid-bv448hq0314-descMetadata.xml
|
176
139
|
- spec/spec_helper.rb
|
177
140
|
- spec/units/common_spec.rb
|
178
|
-
- spec/units/extractor_spec.rb
|
179
141
|
- spec/units/field_mapper_spec.rb
|
180
142
|
- spec/units/solrizer_spec.rb
|
181
143
|
- spec/units/suffix_spec.rb
|
182
|
-
- spec/units/xml_extractor_spec.rb
|
183
144
|
homepage: http://github.com/projecthydra/solrizer
|
184
145
|
licenses: []
|
185
146
|
metadata: {}
|
@@ -208,8 +169,6 @@ test_files:
|
|
208
169
|
- spec/fixtures/druid-bv448hq0314-descMetadata.xml
|
209
170
|
- spec/spec_helper.rb
|
210
171
|
- spec/units/common_spec.rb
|
211
|
-
- spec/units/extractor_spec.rb
|
212
172
|
- spec/units/field_mapper_spec.rb
|
213
173
|
- spec/units/solrizer_spec.rb
|
214
174
|
- spec/units/suffix_spec.rb
|
215
|
-
- spec/units/xml_extractor_spec.rb
|
data/bin/solrizer
DELETED
@@ -1,107 +0,0 @@
|
|
1
|
-
#!/usr/bin/env ruby
|
2
|
-
|
3
|
-
require 'rubygems'
|
4
|
-
require 'optparse'
|
5
|
-
require 'stomp'
|
6
|
-
|
7
|
-
options = {}
|
8
|
-
|
9
|
-
optparse = OptionParser.new do|opts|
|
10
|
-
opts.banner = "Usage: solrizer [options]"
|
11
|
-
|
12
|
-
options[:hydra_home] = nil
|
13
|
-
opts.on( '--hydra_home PATH', 'Load the Hydra instance at this path' ) do |path|
|
14
|
-
if File.exist?(File.join(path,"config","environment.rb"))
|
15
|
-
options[:hydra_home] = path
|
16
|
-
else
|
17
|
-
puts "#{path} does not appear to be a valid rails home"
|
18
|
-
exit
|
19
|
-
end
|
20
|
-
end
|
21
|
-
|
22
|
-
options[:port] = 61613
|
23
|
-
opts.on('-p','--port NUM', 'Stomp port') do |port|
|
24
|
-
options[:port] = port
|
25
|
-
end
|
26
|
-
|
27
|
-
options[:host] = 'localhost'
|
28
|
-
opts.on('-o','--host HOSTNAME', 'Host to connect to') do |host|
|
29
|
-
options[:host] = host
|
30
|
-
end
|
31
|
-
|
32
|
-
options[:user] = 'fedoraStomper'
|
33
|
-
opts.on('-u', '--user USERNAME', 'User name for stomp listener') do |user|
|
34
|
-
options[:user] = user
|
35
|
-
end
|
36
|
-
|
37
|
-
options[:password] = 'fedoraStomper'
|
38
|
-
opts.on('-w', '--password PASSWORD', 'Password for stomp listener') do |password|
|
39
|
-
options[:password] = password
|
40
|
-
end
|
41
|
-
|
42
|
-
options[:destination] = '/topic/fedora.apim.update'
|
43
|
-
opts.on('-d','--destination TOPIC', 'Topic to listen to') do |destination|
|
44
|
-
options[:destination] = destination
|
45
|
-
end
|
46
|
-
|
47
|
-
opts.on('-h', '--help', 'Display this screen') do
|
48
|
-
puts opts
|
49
|
-
exit
|
50
|
-
end
|
51
|
-
end
|
52
|
-
|
53
|
-
optparse.parse!
|
54
|
-
|
55
|
-
begin; require 'rubygems'; rescue; end
|
56
|
-
|
57
|
-
if options[:hydra_home]
|
58
|
-
puts "Loading app..."
|
59
|
-
Dir.chdir(options[:hydra_home])
|
60
|
-
require File.join(options[:hydra_home],"config","environment.rb")
|
61
|
-
|
62
|
-
puts "app loaded"
|
63
|
-
else
|
64
|
-
$stderr.puts "The --hydra_home PATH option is mandatory. Please provide the path to the root of a valid Hydra instance."
|
65
|
-
exit 1
|
66
|
-
end
|
67
|
-
|
68
|
-
puts "loading listener"
|
69
|
-
|
70
|
-
begin
|
71
|
-
@port = options[:port]
|
72
|
-
@host = options[:host]
|
73
|
-
@user = options[:user]
|
74
|
-
@password = options[:password]
|
75
|
-
@reliable = true
|
76
|
-
@clientid = "fedora_stomper"
|
77
|
-
@destination = options[:destination]
|
78
|
-
|
79
|
-
|
80
|
-
$stderr.print "Connecting to stomp://#{@host}:#{@port} as #{@user}\n"
|
81
|
-
@conn = Stomp::Connection.open(@user, @password, @host, @port, @reliable, 5, {"client-id" => @clientid} )
|
82
|
-
$stderr.print "Getting output from #{@destination}\n"
|
83
|
-
|
84
|
-
@conn.subscribe(@destination, {"activemq.subscriptionName" => @clientid, :ack =>"client" })
|
85
|
-
while true
|
86
|
-
@msg = @conn.receive
|
87
|
-
pid = @msg.headers["pid"]
|
88
|
-
method = @msg.headers["methodName"]
|
89
|
-
|
90
|
-
puts @msg.headers.inspect
|
91
|
-
puts "\nPID: #{@msg.headers["pid"]}\n"
|
92
|
-
if ["addDatastream", "addRelationship","ingest","modifyDatastreamByValue","modifyDatastreamByReference","modifyObject","purgeDatastream","purgeRelationship"].include? method
|
93
|
-
ActiveFedora::Base.find(@msg.headers["pid"], cast: true).update_index
|
94
|
-
elsif method == "purgeObject"
|
95
|
-
ActiveFedora::SolrService.instance.conn.delete_by_id(pid)
|
96
|
-
else
|
97
|
-
$stderr.puts "Unknown Method: #{method}"
|
98
|
-
end
|
99
|
-
puts "updated solr index for #{@msg.headers["pid"]}\n"
|
100
|
-
@conn.ack @msg.headers["message-id"]
|
101
|
-
end
|
102
|
-
@conn.join
|
103
|
-
|
104
|
-
rescue Exception => e
|
105
|
-
p e
|
106
|
-
end
|
107
|
-
|
data/bin/solrizerd
DELETED
@@ -1,68 +0,0 @@
|
|
1
|
-
#!/usr/bin/env ruby
|
2
|
-
|
3
|
-
require 'rubygems'
|
4
|
-
require 'daemons'
|
5
|
-
require 'stomp'
|
6
|
-
|
7
|
-
banner=<<-EOC
|
8
|
-
Usage: solrizerd command --hydra_home PATH [options]
|
9
|
-
PATH must point to a valid hydra application
|
10
|
-
Commands:
|
11
|
-
start start an instance of the application
|
12
|
-
stop stop all instances of the application
|
13
|
-
restart stop all instances and restart them afterwards
|
14
|
-
status show status (PID) of application instances
|
15
|
-
Options:
|
16
|
-
--hydra_home PATH Load the hydra instance at this path
|
17
|
-
-p, --port NUM Stomp port (default 61613)
|
18
|
-
-o, --host HOSTNAME Host to connect to
|
19
|
-
-u, --user USERNAME User name for stomp listener
|
20
|
-
-w, --password PASSWORD Password for stomp listener
|
21
|
-
-d, --destination TOPIC Topic to listen to (default: /topic/fedora.apim.update)
|
22
|
-
-h, --help Display this screen
|
23
|
-
EOC
|
24
|
-
|
25
|
-
|
26
|
-
# check for a valid command
|
27
|
-
unless ['start','stop','restart','status'].include? ARGV[0]
|
28
|
-
puts banner
|
29
|
-
exit 7
|
30
|
-
end
|
31
|
-
|
32
|
-
if ARGV.include?('-h') || ARGV.include?('--help')
|
33
|
-
puts banner
|
34
|
-
exit 0
|
35
|
-
end
|
36
|
-
|
37
|
-
# Make sure --hydra_home was set for the start and restart commands
|
38
|
-
if ARGV[0] == 'start' || ARGV[0] == 'restart'
|
39
|
-
unless ARGV[1] == '--hydra_home'
|
40
|
-
puts "ERROR: You must --hydra_home to specify the path to a valid hydra application"
|
41
|
-
exit 8
|
42
|
-
end
|
43
|
-
|
44
|
-
# make sure valid path was set for hydra_home
|
45
|
-
unless ARGV[2] && File.exist?(File.join(ARGV[2],"config","environment.rb"))
|
46
|
-
puts "ERROR: the path entered does not appear to be a valid hydra instance"
|
47
|
-
exit 9
|
48
|
-
end
|
49
|
-
end
|
50
|
-
|
51
|
-
|
52
|
-
options = {
|
53
|
-
:multiple=>false,
|
54
|
-
:dir_mode=>:normal,
|
55
|
-
:dir=>'/tmp',
|
56
|
-
:backtrace=>true
|
57
|
-
}
|
58
|
-
argv_array = []
|
59
|
-
argv_array << ARGV[0]
|
60
|
-
argv_array << '--'
|
61
|
-
ARGV[1..-1].each {|ele| argv_array << ele }
|
62
|
-
options[:ARGV] = argv_array
|
63
|
-
|
64
|
-
version = '>=0'
|
65
|
-
app = Gem.bin_path('solrizer','solrizer',version)
|
66
|
-
|
67
|
-
Daemons.run(app,options)
|
68
|
-
|
data/lib/solrizer/extractor.rb
DELETED
@@ -1,68 +0,0 @@
|
|
1
|
-
module Solrizer
|
2
|
-
|
3
|
-
# Provides utilities for extracting solr fields from a variety of objects and/or creating solr documents from a given object
|
4
|
-
# Note: These utilities are optional. You can implement .to_solr directly on your classes if you want to bypass using Extractors.
|
5
|
-
#
|
6
|
-
# Each of the Solrizer implementations (ie. solrizer-fedora) provides its own Extractor module that extends the behaviors of Solrizer::Extractor
|
7
|
-
# with methods specific to that implementation (ie. extract_tag, extract_rels_ext, xml_to_solr, html_to_solr).
|
8
|
-
# By convention, the solrizer implementations will mix their own Extractors' behaviors into this class when you load them into an application.
|
9
|
-
#
|
10
|
-
class Extractor
|
11
|
-
|
12
|
-
class << self
|
13
|
-
# Insert +field_value+ for +field_name+ into +solr_doc+
|
14
|
-
# Handles inserting new values into a Hash while ensuring that you don't destroy or overwrite any existing values in the hash.
|
15
|
-
# Ensures that field values are always appended to arrays within the values hash.
|
16
|
-
# Also ensures that values are run through format_node_value
|
17
|
-
# @param [Hash] solr_doc
|
18
|
-
# @param [String] field_name
|
19
|
-
# @param [String] field_value
|
20
|
-
def insert_solr_field_value(solr_doc, field_name, field_value)
|
21
|
-
formatted_value = format_node_value(field_value)
|
22
|
-
if solr_doc[field_name]
|
23
|
-
solr_doc[field_name] = Array(solr_doc[field_name]) << formatted_value
|
24
|
-
else
|
25
|
-
solr_doc[field_name] = formatted_value
|
26
|
-
end
|
27
|
-
return solr_doc
|
28
|
-
end
|
29
|
-
|
30
|
-
# Strips the majority of whitespace from the values array and then joins them with a single blank delimitter
|
31
|
-
# Returns an empty string if values argument is nil
|
32
|
-
#
|
33
|
-
# @param [Array] values Array of strings representing the values to be formatted
|
34
|
-
# @return [String]
|
35
|
-
def format_node_value values
|
36
|
-
if values.nil?
|
37
|
-
""
|
38
|
-
else
|
39
|
-
Array(values).map{|val| val.gsub(/\s+/,' ').strip}.join(" ")
|
40
|
-
end
|
41
|
-
end
|
42
|
-
end
|
43
|
-
|
44
|
-
# Instance Methods
|
45
|
-
|
46
|
-
# Alias for Solrizer::Extractor#insert_solr_field_value
|
47
|
-
def insert_solr_field_value(solr_doc, field_name, field_value)
|
48
|
-
Solrizer::Extractor.insert_solr_field_value(solr_doc, field_name, field_value)
|
49
|
-
end
|
50
|
-
|
51
|
-
# Alias for Solrizer::Extractor#format_node_value
|
52
|
-
def format_node_value values
|
53
|
-
Solrizer::Extractor.format_node_value(values)
|
54
|
-
end
|
55
|
-
|
56
|
-
# Deprecated.
|
57
|
-
# merges input_hash into solr_hash
|
58
|
-
# @param [Hash] input_hash the input hash of values
|
59
|
-
# @param [Hash] solr_hash the solr values hash to add the values into
|
60
|
-
# @return [Hash] the populated Solr values hash
|
61
|
-
#
|
62
|
-
def extract_hash( input_hash, solr_hash=Hash.new )
|
63
|
-
warn "[DEPRECATION] `extract_hash` is deprecated. Just pass values directly into your solr values hash"
|
64
|
-
return solr_hash.merge!(input_hash)
|
65
|
-
end
|
66
|
-
|
67
|
-
end
|
68
|
-
end
|
data/lib/solrizer/html.rb
DELETED
@@ -1,36 +0,0 @@
|
|
1
|
-
require "nokogiri"
|
2
|
-
require 'yaml'
|
3
|
-
|
4
|
-
module Solrizer::HTML::Extractor
|
5
|
-
|
6
|
-
#
|
7
|
-
# This method strips html tags out and returns content to be indexed in solr
|
8
|
-
#
|
9
|
-
# @param [Datastream] ds object that responds to .content with HTML content
|
10
|
-
# @param [Hash] solr_doc hash of values to be inserted into solr as a solr document
|
11
|
-
def html_to_solr( ds, solr_doc=Hash.new )
|
12
|
-
|
13
|
-
text = CGI.unescapeHTML(ds.content)
|
14
|
-
doc = Nokogiri::HTML(text)
|
15
|
-
|
16
|
-
# html to story_display
|
17
|
-
stories = doc.xpath('//story')
|
18
|
-
|
19
|
-
stories.each do |story|
|
20
|
-
solr_doc.merge!({:story_display => story.children.to_xml})
|
21
|
-
end
|
22
|
-
|
23
|
-
#strip out text and put in story_t
|
24
|
-
text_nodes = doc.xpath("//text()")
|
25
|
-
text = String.new
|
26
|
-
|
27
|
-
text_nodes.each do |text_node|
|
28
|
-
text << text_node.content
|
29
|
-
end
|
30
|
-
|
31
|
-
solr_doc.merge!({:story_t => text})
|
32
|
-
|
33
|
-
return solr_doc
|
34
|
-
end
|
35
|
-
|
36
|
-
end
|
data/lib/solrizer/xml.rb
DELETED
@@ -1,32 +0,0 @@
|
|
1
|
-
require "xmlsimple"
|
2
|
-
|
3
|
-
module Solrizer::XML::Extractor
|
4
|
-
|
5
|
-
#
|
6
|
-
# This method extracts solr fields from simple xml
|
7
|
-
# If you want to do anything more nuanced with the xml, use OM instead.
|
8
|
-
#
|
9
|
-
# @param [xml] text xml content to index
|
10
|
-
# @param [Hash] solr_doc
|
11
|
-
def xml_to_solr( text, solr_doc=Hash.new, mapper = Solrizer.default_field_mapper )
|
12
|
-
doc = XmlSimple.xml_in( text )
|
13
|
-
|
14
|
-
doc.each_pair do |name, value|
|
15
|
-
if value.kind_of?(Array)
|
16
|
-
if value.first.kind_of?(Hash)
|
17
|
-
# This deals with the way xml-simple handles nodes with attributes
|
18
|
-
solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => "#{value.first["content"]}"})
|
19
|
-
elsif value.length > 1
|
20
|
-
solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => value})
|
21
|
-
else
|
22
|
-
solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => "#{value.first}"})
|
23
|
-
end
|
24
|
-
else
|
25
|
-
solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => "#{value}"})
|
26
|
-
end
|
27
|
-
end
|
28
|
-
|
29
|
-
return solr_doc
|
30
|
-
end
|
31
|
-
|
32
|
-
end
|
@@ -1,44 +0,0 @@
|
|
1
|
-
require 'spec_helper'
|
2
|
-
|
3
|
-
describe Solrizer::Extractor do
|
4
|
-
|
5
|
-
before(:all) do
|
6
|
-
@extractor = Solrizer::Extractor.new
|
7
|
-
end
|
8
|
-
|
9
|
-
describe ".format_node_value" do
|
10
|
-
it "should strip white space out of the array and join it with a single blank" do
|
11
|
-
expect(Solrizer::Extractor.format_node_value([" test \n node \t value \t"])).to eq "test node value"
|
12
|
-
expect(Solrizer::Extractor.format_node_value([" test ", " \n node ", " \t value \t"])).to eq "test node value"
|
13
|
-
end
|
14
|
-
it "should return an empty string if given an argument of nil" do
|
15
|
-
expect(Solrizer::Extractor.format_node_value(nil)).to eq ''
|
16
|
-
end
|
17
|
-
|
18
|
-
it "should strip white space out of a string" do
|
19
|
-
expect(Solrizer::Extractor.format_node_value("raw string\n with whitespace")).to eq "raw string with whitespace"
|
20
|
-
end
|
21
|
-
end
|
22
|
-
|
23
|
-
describe "#insert_solr_field_value" do
|
24
|
-
it "should initialize a solr doc list if it is nil" do
|
25
|
-
solr_doc = {'title_tesim' => nil }
|
26
|
-
Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Frank')
|
27
|
-
expect(solr_doc).to eq("title_tesim"=>"Frank")
|
28
|
-
end
|
29
|
-
it "should insert multiple" do
|
30
|
-
solr_doc = {'title_tesim' => nil }
|
31
|
-
Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Frank')
|
32
|
-
Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Margret')
|
33
|
-
Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Joyce')
|
34
|
-
expect(solr_doc).to eq("title_tesim"=>["Frank", 'Margret', 'Joyce'])
|
35
|
-
end
|
36
|
-
it "should not make a list if a single valued field is passed in" do
|
37
|
-
solr_doc = {}
|
38
|
-
Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_dtsi', '2013-03-22T12:33:00Z')
|
39
|
-
expect(solr_doc).to eq("title_dtsi"=>"2013-03-22T12:33:00Z")
|
40
|
-
end
|
41
|
-
|
42
|
-
end
|
43
|
-
|
44
|
-
end
|
@@ -1,26 +0,0 @@
|
|
1
|
-
require 'spec_helper'
|
2
|
-
|
3
|
-
describe Solrizer::XML::Extractor do
|
4
|
-
|
5
|
-
before do
|
6
|
-
@extractor = Solrizer::Extractor.new
|
7
|
-
end
|
8
|
-
|
9
|
-
let(:result) { @extractor.xml_to_solr(fixture("druid-bv448hq0314-descMetadata.xml"))}
|
10
|
-
|
11
|
-
describe ".xml_to_solr" do
|
12
|
-
it "should turn simple xml into a solr document" do
|
13
|
-
expect(result[:type_tesim]).to eq "text"
|
14
|
-
expect(result[:medium_tesim]).to eq "Paper Document"
|
15
|
-
expect(result[:rights_tesim]).to eq "Presumed under copyright. Do not publish."
|
16
|
-
expect(result[:date_tesim]).to eq "1985-12-30"
|
17
|
-
expect(result[:format_tesim]).to be_kind_of(Array)
|
18
|
-
expect(result[:format_tesim]).to include("application/tiff")
|
19
|
-
expect(result[:format_tesim]).to include("application/pdf")
|
20
|
-
expect(result[:format_tesim]).to include("application/jp2000")
|
21
|
-
expect(result[:title_tesim]).to eq "This is a Sample Title"
|
22
|
-
expect(result[:publisher_tesim]).to eq "Sample Unversity"
|
23
|
-
end
|
24
|
-
end
|
25
|
-
|
26
|
-
end
|