solrizer 3.4.1 → 4.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 75f93d429c92672c47052351cd82036b31a98916
4
- data.tar.gz: bd2336c84504de0921a74feb4fead735275b9110
3
+ metadata.gz: d9aa32a4193f5ad11975f40824475634bf149efb
4
+ data.tar.gz: 6300a897b884f28fd8a335f36e6a2ee214df4d16
5
5
  SHA512:
6
- metadata.gz: 034013543a506460f4e39f31ef449336a533383ecd9279b5900e7a4f341266620084c47459469d14f79abcd15aa6dd4c31bcd404e0e4d3653ec58915df817428
7
- data.tar.gz: dc8d9f340f30bec3e03820677b833d0012188aafafad1f21d6a234a21640621d0eb74b702bd5f508fc61672b512387bc871388d9346eee46ce18b862417fcc8f
6
+ metadata.gz: 955481c893fbd628a1c61a088465f40d07978647eb5d711f1ad7ff63711da6301a4a735dd7a770d2d95e63a93dc430a509d715cdeddd7a1c4b1a57131a2ed8bd
7
+ data.tar.gz: c7eb036b3a788b8de7a764c077f60ef7e74ae90ce70335f613463920852cc47f5adb6541d2a304fc509c0000619f1d2601924a6fa4a314275d4ba186323952f0
@@ -3,6 +3,13 @@
3
3
  We want your help to make Project Hydra great.
4
4
  There are a few guidelines that we need contributors to follow so that we can have a chance of keeping on top of things.
5
5
 
6
+ ## Code of Conduct
7
+
8
+ The Hydra community is dedicated to providing a welcoming and positive experience for all its
9
+ members, whether they are at a formal gathering, in a social setting, or taking part in activities
10
+ online. Please see our [Code of Conduct](https://wiki.duraspace.org/display/hydra/Code+of+Conduct)
11
+ for more information.
12
+
6
13
  ## Hydra Project Intellectual Property Licensing and Ownership
7
14
 
8
15
  All code contributors must have an Individual Contributor License Agreement (iCLA) on file with the Hydra Project Steering Group.
@@ -16,8 +23,10 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
16
23
 
17
24
  * Reporting Issues
18
25
  * Making Changes
26
+ * Documenting Code
27
+ * Committing Changes
19
28
  * Submitting Changes
20
- * Merging Changes
29
+ * Reviewing and Merging Changes
21
30
 
22
31
  ### Reporting Issues
23
32
 
@@ -38,8 +47,28 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
38
47
  * Then checkout the new branch with `git checkout fix/master/my_contribution`.
39
48
  * Please avoid working directly on the `master` branch.
40
49
  * You may find the [hub suite of commands](https://github.com/defunkt/hub) helpful
50
+ * Make sure you have added sufficient tests and documentation for your changes.
51
+ * Test functionality with RSpec; est features / UI with Capybara.
52
+ * Run _all_ the tests to assure nothing else was accidentally broken.
53
+
54
+ ### Documenting Code
55
+
56
+ * All new public methods, modules, and classes should include inline documentation in [YARD](http://yardoc.org/).
57
+ * Documentation should seek to answer the question "why does this code exist?"
58
+ * Document private / protected methods as desired.
59
+ * If you are working in a file with no prior documentation, do try to document as you gain understanding of the code.
60
+ * If you don't know exactly what a bit of code does, it is extra likely that it needs to be documented. Take a stab at it and ask for feedback in your pull request. You can use the 'blame' button on GitHub to identify the original developer of the code and @mention them in your comment.
61
+ * This work greatly increases the usability of the code base and supports the on-ramping of new committers.
62
+ * We will all be understanding of one another's time constraints in this area.
63
+ * YARD examples:
64
+ * [Hydra::Works::RemoveGenericFile](https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/services/generic_work/remove_generic_file.rb)
65
+ * [ActiveTriples::LocalName::Minter](https://github.com/ActiveTriples/active_triples-local_name/blob/master/lib/active_triples/local_name/minter.rb)
66
+ * [Getting started with YARD](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md)
67
+
68
+ ### Committing changes
69
+
41
70
  * Make commits of logical units.
42
- * Your commit should include a high level description of your work in HISTORY.textile
71
+ * Your commit should include a high level description of your work in HISTORY.textile
43
72
  * Check for unnecessary whitespace with `git diff --check` before committing.
44
73
  * Make sure your commit messages are [well formed](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
45
74
  * If you created an issue, you can close it by including "Closes #issue" in your commit message. See [Github's blog post for more details](https://github.com/blog/1386-closing-issues-via-commit-messages)
@@ -60,7 +89,9 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
60
89
 
61
90
  class PostsController
62
91
  def index
63
- respond_with Post.limit(10)
92
+ respond_to do |wants|
93
+ wants.html { render 'index' }
94
+ end
64
95
  end
65
96
  end
66
97
 
@@ -72,38 +103,53 @@ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the pr
72
103
  long to fit in 72 characters
73
104
  ```
74
105
 
75
- * Make sure you have added the necessary tests for your changes.
76
- * Run _all_ the tests to assure nothing else was accidentally broken.
77
- * When you are ready to submit a pull request
78
-
79
106
  ### Submitting Changes
80
107
 
81
- [Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
82
-
83
108
  * Read the article ["Using Pull Requests"](https://help.github.com/articles/using-pull-requests) on GitHub.
84
109
  * Make sure your branch is up to date with its parent branch (i.e. master)
85
110
  * `git checkout master`
86
111
  * `git pull --rebase`
87
112
  * `git checkout <your-branch>`
88
113
  * `git rebase master`
89
- * It is likely a good idea to run your tests again.
90
- * Squash the commits for your branch into one commit
91
- * `git rebase --interactive HEAD~<number-of-commits>` ([See Github help](https://help.github.com/articles/interactive-rebase))
92
- * To determine the number of commits on your branch: `git log master..<your-branch> --oneline | wc -l`
114
+ * It is a good idea to run your tests again.
115
+ * If you've made more than one commit take a moment to consider whether squashing commits together would help improve their logical grouping.
116
+ * [Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
117
+ * `git rebase --interactive master` ([See Github help](https://help.github.com/articles/interactive-rebase))
93
118
  * Squashing your branch's changes into one commit is "good form" and helps the person merging your request to see everything that is going on.
94
119
  * Push your changes to a topic branch in your fork of the repository.
95
120
  * Submit a pull request from your fork to the project.
96
121
 
97
- ### Merging Changes
122
+ ### Reviewing and Merging Changes
123
+
124
+ We adopted [Github's Pull Request Review](https://help.github.com/articles/about-pull-request-reviews/) for our repositories.
125
+ Common checks that may occur in our repositories:
126
+
127
+ 1. Travis CI - where our automated tests are running
128
+ 2. Hound CI - where we check for style violations
129
+ 3. Approval Required - Github enforces at least one person approve a pull request. Also, all reviewers that have chimed in must approve.
130
+ 4. CodeClimate - is our code remaining healthy (at least according to static code analysis)
131
+
132
+ If one or more of the required checks failed (or are incomplete), the code should not be merged (and the UI will not allow it). If all of the checks have passed, then anyone on the project (including the pull request submitter) may merge the code.
133
+
134
+ *Example: Carolyn submits a pull request, Justin reviews the pull request and approves. However, Justin is still waiting on other checks (Travis CI is usually the culprit), so he does not merge the pull request. Eventually, all of the checks pass. At this point, Carolyn or anyone else may merge the pull request.*
135
+
136
+ #### Things to Consider When Reviewing
137
+
138
+ First, the person contributing the code is putting themselves out there. Be mindful of what you say in a review.
139
+
140
+ * Ask clarifying questions
141
+ * State your understanding and expectations
142
+ * Provide example code or alternate solutions, and explain why
143
+
144
+ This is your chance for a mentoring moment of another developer. Take time to give an honest and thorough review of what has changed. Things to consider:
98
145
 
99
- * It is considered "poor from" to merge your own request.
100
- * Please take the time to review the changes and get a sense of what is being changed. Things to consider:
101
146
  * Does the commit message explain what is going on?
102
- * Does the code changes have tests? _Not all changes need new tests, some changes are refactorings_
147
+ * Does the code changes have tests? _Not all changes need new tests, some changes are refactors_
148
+ * Do new or changed methods, modules, and classes have documentation?
103
149
  * Does the commit contain more than it should? Are two separate concerns being addressed in one commit?
104
- * Did the Travis tests complete successfully?
105
- * If you are uncertain, bring other contributors into the conversation by creating a comment that includes their @username.
106
- * If you like the pull request, but want others to chime in, create a +1 comment and tag a user.
150
+ * Does the description of the new/changed specs match your understanding of what the spec is doing?
151
+
152
+ If you are uncertain, bring other contributors into the conversation by assigning them as a reviewer.
107
153
 
108
154
  # Additional Resources
109
155
 
data/README.md CHANGED
@@ -3,13 +3,7 @@
3
3
  [![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
4
4
  [![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)
5
5
 
6
- A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
7
- the command line, or as a JMS listener.
8
-
9
- Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
10
- data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
11
- "solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
12
- fedora repository and writing to a solr instance.
6
+ A lightweight tool for creating dynamic solr schema sufixes.
13
7
 
14
8
 
15
9
  ## Installation
@@ -157,74 +151,6 @@ But now you may also pass an Descriptor instance if that works for you:
157
151
  indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
158
152
  t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
159
153
 
160
- ### Extractor and Extractor Mixins
161
-
162
- Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
163
-
164
- > extractor = Solrizer::Extractor.new
165
- > solr_doc = Hash.new
166
- > extractor.format_node_value(["foo ","\n bar"])
167
- => "foo bar"
168
- > extractor.insert_solr_field_value(solr_doc, "foo","bar")
169
- => {"foo"=>"bar"}
170
- > extractor.insert_solr_field_value(solr_doc,"foo","baz")
171
- => {"foo"=>["bar", "baz"]}
172
- > extractor.insert_solr_field_value(solr_doc, "boo","hoo")
173
- => {"foo"=>["bar", "baz"], "boo"=>"hoo"}
174
-
175
- #### Solrizer provides some default mixins:
176
-
177
- `Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
178
-
179
- > Solrizer::XML::Extractor
180
- > extractor = Solrizer::Extractor.new
181
- > xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
182
- > extractor.xml_to_solr(xml)
183
- => {:foo_tesim=>"bar", :bar_tesim=>"baz"}
184
-
185
- #### Solrizer::XML::TerminologyBasedSolrizer
186
-
187
- Another powerful mixin for use with classes that include the `OM::XML::Document` module is
188
- `Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
189
- terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
190
-
191
- ## JMS Listener for Hydra Rails Applications
192
-
193
- ### The executables: solrizer and solrizerd
194
-
195
- The solrizer gem provides two executables:
196
-
197
- * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
198
- * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
199
-
200
- ### Usage
201
-
202
- The usage for solrizerd is as follows:
203
-
204
- solrizerd command --hydra_home PATH [options]
205
-
206
- The commands are as follows:
207
- * start start an instance of the application
208
- * stop stop all instances of the application
209
- * restart stop all instances and restart them afterwards
210
- * status show status (PID) of application instances
211
-
212
- Required parameters:
213
-
214
- --hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
215
-
216
- The options:
217
- * -p, --port Stomp port 61613
218
- * -o, --host Host to connect to localhost
219
- * -u, --user User name for stomp listener
220
- * -w, --password Password for stomp listener
221
- * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
222
- * -h, --help Display this screen
223
-
224
- Note:
225
-
226
- Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
227
-
228
154
  ## Note on Patches/Pull Requests
229
155
 
230
156
  * Fork the project.
@@ -5,14 +5,11 @@ module Solrizer
5
5
  extend ActiveSupport::Autoload
6
6
 
7
7
  autoload :Common
8
- autoload :Extractor
9
8
  autoload :Descriptor
10
9
  autoload :FieldMapper
11
10
  autoload :DefaultDescriptors
12
11
  autoload :Suffix
13
- autoload :HTML, 'solrizer/html'
14
12
  autoload :VERSION, 'solrizer/version'
15
- autoload :XML, 'solrizer/xml'
16
13
 
17
14
  mattr_accessor :logger, instance_writer: false
18
15
 
@@ -163,7 +163,7 @@ module Solrizer
163
163
  def extract_type(value)
164
164
  case value
165
165
  when NilClass
166
- when 0.class # Fixnum for ruby < 2.4, and Integer afterwards
166
+ when Integer # In ruby < 2.4, Fixnum extends Integer
167
167
  :integer
168
168
  when DateTime
169
169
  :time
@@ -1,3 +1,3 @@
1
1
  module Solrizer
2
- VERSION = "3.4.1"
2
+ VERSION = "4.0.0"
3
3
  end
@@ -14,8 +14,6 @@ Gem::Specification.new do |s|
14
14
 
15
15
  s.add_dependency "nokogiri"
16
16
  s.add_dependency "xml-simple"
17
- s.add_dependency "stomp"
18
- s.add_dependency "daemons"
19
17
  s.add_dependency "activesupport"
20
18
  s.add_development_dependency 'rspec', '~> 3.5'
21
19
  s.add_development_dependency 'rake'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: solrizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.4.1
4
+ version: 4.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matt Zumwalt
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-01-05 00:00:00.000000000 Z
11
+ date: 2017-01-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -38,34 +38,6 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: stomp
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :runtime
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: daemons
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '0'
62
- type: :runtime
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
41
  - !ruby/object:Gem::Dependency
70
42
  name: activesupport
71
43
  requirement: !ruby/object:Gem::Requirement
@@ -139,9 +111,7 @@ dependencies:
139
111
  description: Use solrizer to populate solr indexes. You can run solrizer from within
140
112
  your app, using the provided rake tasks, or as a JMS listener
141
113
  email: hydra-tech@googlegroups.com
142
- executables:
143
- - solrizer
144
- - solrizerd
114
+ executables: []
145
115
  extensions: []
146
116
  extra_rdoc_files:
147
117
  - LICENSE
@@ -155,31 +125,22 @@ files:
155
125
  - LICENSE
156
126
  - README.md
157
127
  - Rakefile
158
- - bin/solrizer
159
- - bin/solrizerd
160
128
  - lib/solrizer.rb
161
129
  - lib/solrizer/common.rb
162
130
  - lib/solrizer/default_descriptors.rb
163
131
  - lib/solrizer/descriptor.rb
164
- - lib/solrizer/extractor.rb
165
132
  - lib/solrizer/field_mapper.rb
166
- - lib/solrizer/html.rb
167
- - lib/solrizer/html/extractor.rb
168
133
  - lib/solrizer/suffix.rb
169
134
  - lib/solrizer/version.rb
170
- - lib/solrizer/xml.rb
171
- - lib/solrizer/xml/extractor.rb
172
135
  - lib/tasks/solrizer.rake
173
136
  - solrizer.gemspec
174
137
  - spec/.rspec
175
138
  - spec/fixtures/druid-bv448hq0314-descMetadata.xml
176
139
  - spec/spec_helper.rb
177
140
  - spec/units/common_spec.rb
178
- - spec/units/extractor_spec.rb
179
141
  - spec/units/field_mapper_spec.rb
180
142
  - spec/units/solrizer_spec.rb
181
143
  - spec/units/suffix_spec.rb
182
- - spec/units/xml_extractor_spec.rb
183
144
  homepage: http://github.com/projecthydra/solrizer
184
145
  licenses: []
185
146
  metadata: {}
@@ -208,8 +169,6 @@ test_files:
208
169
  - spec/fixtures/druid-bv448hq0314-descMetadata.xml
209
170
  - spec/spec_helper.rb
210
171
  - spec/units/common_spec.rb
211
- - spec/units/extractor_spec.rb
212
172
  - spec/units/field_mapper_spec.rb
213
173
  - spec/units/solrizer_spec.rb
214
174
  - spec/units/suffix_spec.rb
215
- - spec/units/xml_extractor_spec.rb
@@ -1,107 +0,0 @@
1
- #!/usr/bin/env ruby
2
-
3
- require 'rubygems'
4
- require 'optparse'
5
- require 'stomp'
6
-
7
- options = {}
8
-
9
- optparse = OptionParser.new do|opts|
10
- opts.banner = "Usage: solrizer [options]"
11
-
12
- options[:hydra_home] = nil
13
- opts.on( '--hydra_home PATH', 'Load the Hydra instance at this path' ) do |path|
14
- if File.exist?(File.join(path,"config","environment.rb"))
15
- options[:hydra_home] = path
16
- else
17
- puts "#{path} does not appear to be a valid rails home"
18
- exit
19
- end
20
- end
21
-
22
- options[:port] = 61613
23
- opts.on('-p','--port NUM', 'Stomp port') do |port|
24
- options[:port] = port
25
- end
26
-
27
- options[:host] = 'localhost'
28
- opts.on('-o','--host HOSTNAME', 'Host to connect to') do |host|
29
- options[:host] = host
30
- end
31
-
32
- options[:user] = 'fedoraStomper'
33
- opts.on('-u', '--user USERNAME', 'User name for stomp listener') do |user|
34
- options[:user] = user
35
- end
36
-
37
- options[:password] = 'fedoraStomper'
38
- opts.on('-w', '--password PASSWORD', 'Password for stomp listener') do |password|
39
- options[:password] = password
40
- end
41
-
42
- options[:destination] = '/topic/fedora.apim.update'
43
- opts.on('-d','--destination TOPIC', 'Topic to listen to') do |destination|
44
- options[:destination] = destination
45
- end
46
-
47
- opts.on('-h', '--help', 'Display this screen') do
48
- puts opts
49
- exit
50
- end
51
- end
52
-
53
- optparse.parse!
54
-
55
- begin; require 'rubygems'; rescue; end
56
-
57
- if options[:hydra_home]
58
- puts "Loading app..."
59
- Dir.chdir(options[:hydra_home])
60
- require File.join(options[:hydra_home],"config","environment.rb")
61
-
62
- puts "app loaded"
63
- else
64
- $stderr.puts "The --hydra_home PATH option is mandatory. Please provide the path to the root of a valid Hydra instance."
65
- exit 1
66
- end
67
-
68
- puts "loading listener"
69
-
70
- begin
71
- @port = options[:port]
72
- @host = options[:host]
73
- @user = options[:user]
74
- @password = options[:password]
75
- @reliable = true
76
- @clientid = "fedora_stomper"
77
- @destination = options[:destination]
78
-
79
-
80
- $stderr.print "Connecting to stomp://#{@host}:#{@port} as #{@user}\n"
81
- @conn = Stomp::Connection.open(@user, @password, @host, @port, @reliable, 5, {"client-id" => @clientid} )
82
- $stderr.print "Getting output from #{@destination}\n"
83
-
84
- @conn.subscribe(@destination, {"activemq.subscriptionName" => @clientid, :ack =>"client" })
85
- while true
86
- @msg = @conn.receive
87
- pid = @msg.headers["pid"]
88
- method = @msg.headers["methodName"]
89
-
90
- puts @msg.headers.inspect
91
- puts "\nPID: #{@msg.headers["pid"]}\n"
92
- if ["addDatastream", "addRelationship","ingest","modifyDatastreamByValue","modifyDatastreamByReference","modifyObject","purgeDatastream","purgeRelationship"].include? method
93
- ActiveFedora::Base.find(@msg.headers["pid"], cast: true).update_index
94
- elsif method == "purgeObject"
95
- ActiveFedora::SolrService.instance.conn.delete_by_id(pid)
96
- else
97
- $stderr.puts "Unknown Method: #{method}"
98
- end
99
- puts "updated solr index for #{@msg.headers["pid"]}\n"
100
- @conn.ack @msg.headers["message-id"]
101
- end
102
- @conn.join
103
-
104
- rescue Exception => e
105
- p e
106
- end
107
-
@@ -1,68 +0,0 @@
1
- #!/usr/bin/env ruby
2
-
3
- require 'rubygems'
4
- require 'daemons'
5
- require 'stomp'
6
-
7
- banner=<<-EOC
8
- Usage: solrizerd command --hydra_home PATH [options]
9
- PATH must point to a valid hydra application
10
- Commands:
11
- start start an instance of the application
12
- stop stop all instances of the application
13
- restart stop all instances and restart them afterwards
14
- status show status (PID) of application instances
15
- Options:
16
- --hydra_home PATH Load the hydra instance at this path
17
- -p, --port NUM Stomp port (default 61613)
18
- -o, --host HOSTNAME Host to connect to
19
- -u, --user USERNAME User name for stomp listener
20
- -w, --password PASSWORD Password for stomp listener
21
- -d, --destination TOPIC Topic to listen to (default: /topic/fedora.apim.update)
22
- -h, --help Display this screen
23
- EOC
24
-
25
-
26
- # check for a valid command
27
- unless ['start','stop','restart','status'].include? ARGV[0]
28
- puts banner
29
- exit 7
30
- end
31
-
32
- if ARGV.include?('-h') || ARGV.include?('--help')
33
- puts banner
34
- exit 0
35
- end
36
-
37
- # Make sure --hydra_home was set for the start and restart commands
38
- if ARGV[0] == 'start' || ARGV[0] == 'restart'
39
- unless ARGV[1] == '--hydra_home'
40
- puts "ERROR: You must --hydra_home to specify the path to a valid hydra application"
41
- exit 8
42
- end
43
-
44
- # make sure valid path was set for hydra_home
45
- unless ARGV[2] && File.exist?(File.join(ARGV[2],"config","environment.rb"))
46
- puts "ERROR: the path entered does not appear to be a valid hydra instance"
47
- exit 9
48
- end
49
- end
50
-
51
-
52
- options = {
53
- :multiple=>false,
54
- :dir_mode=>:normal,
55
- :dir=>'/tmp',
56
- :backtrace=>true
57
- }
58
- argv_array = []
59
- argv_array << ARGV[0]
60
- argv_array << '--'
61
- ARGV[1..-1].each {|ele| argv_array << ele }
62
- options[:ARGV] = argv_array
63
-
64
- version = '>=0'
65
- app = Gem.bin_path('solrizer','solrizer',version)
66
-
67
- Daemons.run(app,options)
68
-
@@ -1,68 +0,0 @@
1
- module Solrizer
2
-
3
- # Provides utilities for extracting solr fields from a variety of objects and/or creating solr documents from a given object
4
- # Note: These utilities are optional. You can implement .to_solr directly on your classes if you want to bypass using Extractors.
5
- #
6
- # Each of the Solrizer implementations (ie. solrizer-fedora) provides its own Extractor module that extends the behaviors of Solrizer::Extractor
7
- # with methods specific to that implementation (ie. extract_tag, extract_rels_ext, xml_to_solr, html_to_solr).
8
- # By convention, the solrizer implementations will mix their own Extractors' behaviors into this class when you load them into an application.
9
- #
10
- class Extractor
11
-
12
- class << self
13
- # Insert +field_value+ for +field_name+ into +solr_doc+
14
- # Handles inserting new values into a Hash while ensuring that you don't destroy or overwrite any existing values in the hash.
15
- # Ensures that field values are always appended to arrays within the values hash.
16
- # Also ensures that values are run through format_node_value
17
- # @param [Hash] solr_doc
18
- # @param [String] field_name
19
- # @param [String] field_value
20
- def insert_solr_field_value(solr_doc, field_name, field_value)
21
- formatted_value = format_node_value(field_value)
22
- if solr_doc[field_name]
23
- solr_doc[field_name] = Array(solr_doc[field_name]) << formatted_value
24
- else
25
- solr_doc[field_name] = formatted_value
26
- end
27
- return solr_doc
28
- end
29
-
30
- # Strips the majority of whitespace from the values array and then joins them with a single blank delimitter
31
- # Returns an empty string if values argument is nil
32
- #
33
- # @param [Array] values Array of strings representing the values to be formatted
34
- # @return [String]
35
- def format_node_value values
36
- if values.nil?
37
- ""
38
- else
39
- Array(values).map{|val| val.gsub(/\s+/,' ').strip}.join(" ")
40
- end
41
- end
42
- end
43
-
44
- # Instance Methods
45
-
46
- # Alias for Solrizer::Extractor#insert_solr_field_value
47
- def insert_solr_field_value(solr_doc, field_name, field_value)
48
- Solrizer::Extractor.insert_solr_field_value(solr_doc, field_name, field_value)
49
- end
50
-
51
- # Alias for Solrizer::Extractor#format_node_value
52
- def format_node_value values
53
- Solrizer::Extractor.format_node_value(values)
54
- end
55
-
56
- # Deprecated.
57
- # merges input_hash into solr_hash
58
- # @param [Hash] input_hash the input hash of values
59
- # @param [Hash] solr_hash the solr values hash to add the values into
60
- # @return [Hash] the populated Solr values hash
61
- #
62
- def extract_hash( input_hash, solr_hash=Hash.new )
63
- warn "[DEPRECATION] `extract_hash` is deprecated. Just pass values directly into your solr values hash"
64
- return solr_hash.merge!(input_hash)
65
- end
66
-
67
- end
68
- end
@@ -1,7 +0,0 @@
1
- require "solrizer"
2
- module Solrizer::HTML
3
- end
4
-
5
- Dir[File.join(File.dirname(__FILE__),"html","*.rb")].each {|file| require file }
6
-
7
- Solrizer::Extractor.send(:include, Solrizer::HTML::Extractor)
@@ -1,36 +0,0 @@
1
- require "nokogiri"
2
- require 'yaml'
3
-
4
- module Solrizer::HTML::Extractor
5
-
6
- #
7
- # This method strips html tags out and returns content to be indexed in solr
8
- #
9
- # @param [Datastream] ds object that responds to .content with HTML content
10
- # @param [Hash] solr_doc hash of values to be inserted into solr as a solr document
11
- def html_to_solr( ds, solr_doc=Hash.new )
12
-
13
- text = CGI.unescapeHTML(ds.content)
14
- doc = Nokogiri::HTML(text)
15
-
16
- # html to story_display
17
- stories = doc.xpath('//story')
18
-
19
- stories.each do |story|
20
- solr_doc.merge!({:story_display => story.children.to_xml})
21
- end
22
-
23
- #strip out text and put in story_t
24
- text_nodes = doc.xpath("//text()")
25
- text = String.new
26
-
27
- text_nodes.each do |text_node|
28
- text << text_node.content
29
- end
30
-
31
- solr_doc.merge!({:story_t => text})
32
-
33
- return solr_doc
34
- end
35
-
36
- end
@@ -1,5 +0,0 @@
1
- module Solrizer::XML
2
- end
3
- Dir[File.join(File.dirname(__FILE__),"xml","*.rb")].each {|file| require file }
4
-
5
- Solrizer::Extractor.send(:include, Solrizer::XML::Extractor)
@@ -1,32 +0,0 @@
1
- require "xmlsimple"
2
-
3
- module Solrizer::XML::Extractor
4
-
5
- #
6
- # This method extracts solr fields from simple xml
7
- # If you want to do anything more nuanced with the xml, use OM instead.
8
- #
9
- # @param [xml] text xml content to index
10
- # @param [Hash] solr_doc
11
- def xml_to_solr( text, solr_doc=Hash.new, mapper = Solrizer.default_field_mapper )
12
- doc = XmlSimple.xml_in( text )
13
-
14
- doc.each_pair do |name, value|
15
- if value.kind_of?(Array)
16
- if value.first.kind_of?(Hash)
17
- # This deals with the way xml-simple handles nodes with attributes
18
- solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => "#{value.first["content"]}"})
19
- elsif value.length > 1
20
- solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => value})
21
- else
22
- solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => "#{value.first}"})
23
- end
24
- else
25
- solr_doc.merge!({mapper.solr_name(name, :stored_searchable, :type=>:text).to_sym => "#{value}"})
26
- end
27
- end
28
-
29
- return solr_doc
30
- end
31
-
32
- end
@@ -1,44 +0,0 @@
1
- require 'spec_helper'
2
-
3
- describe Solrizer::Extractor do
4
-
5
- before(:all) do
6
- @extractor = Solrizer::Extractor.new
7
- end
8
-
9
- describe ".format_node_value" do
10
- it "should strip white space out of the array and join it with a single blank" do
11
- expect(Solrizer::Extractor.format_node_value([" test \n node \t value \t"])).to eq "test node value"
12
- expect(Solrizer::Extractor.format_node_value([" test ", " \n node ", " \t value \t"])).to eq "test node value"
13
- end
14
- it "should return an empty string if given an argument of nil" do
15
- expect(Solrizer::Extractor.format_node_value(nil)).to eq ''
16
- end
17
-
18
- it "should strip white space out of a string" do
19
- expect(Solrizer::Extractor.format_node_value("raw string\n with whitespace")).to eq "raw string with whitespace"
20
- end
21
- end
22
-
23
- describe "#insert_solr_field_value" do
24
- it "should initialize a solr doc list if it is nil" do
25
- solr_doc = {'title_tesim' => nil }
26
- Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Frank')
27
- expect(solr_doc).to eq("title_tesim"=>"Frank")
28
- end
29
- it "should insert multiple" do
30
- solr_doc = {'title_tesim' => nil }
31
- Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Frank')
32
- Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Margret')
33
- Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_tesim', 'Joyce')
34
- expect(solr_doc).to eq("title_tesim"=>["Frank", 'Margret', 'Joyce'])
35
- end
36
- it "should not make a list if a single valued field is passed in" do
37
- solr_doc = {}
38
- Solrizer::Extractor.insert_solr_field_value(solr_doc, 'title_dtsi', '2013-03-22T12:33:00Z')
39
- expect(solr_doc).to eq("title_dtsi"=>"2013-03-22T12:33:00Z")
40
- end
41
-
42
- end
43
-
44
- end
@@ -1,26 +0,0 @@
1
- require 'spec_helper'
2
-
3
- describe Solrizer::XML::Extractor do
4
-
5
- before do
6
- @extractor = Solrizer::Extractor.new
7
- end
8
-
9
- let(:result) { @extractor.xml_to_solr(fixture("druid-bv448hq0314-descMetadata.xml"))}
10
-
11
- describe ".xml_to_solr" do
12
- it "should turn simple xml into a solr document" do
13
- expect(result[:type_tesim]).to eq "text"
14
- expect(result[:medium_tesim]).to eq "Paper Document"
15
- expect(result[:rights_tesim]).to eq "Presumed under copyright. Do not publish."
16
- expect(result[:date_tesim]).to eq "1985-12-30"
17
- expect(result[:format_tesim]).to be_kind_of(Array)
18
- expect(result[:format_tesim]).to include("application/tiff")
19
- expect(result[:format_tesim]).to include("application/pdf")
20
- expect(result[:format_tesim]).to include("application/jp2000")
21
- expect(result[:title_tesim]).to eq "This is a Sample Title"
22
- expect(result[:publisher_tesim]).to eq "Sample Unversity"
23
- end
24
- end
25
-
26
- end