harvestdor-indexer 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,25 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ .travis
7
+ .rvmrc
8
+ Gemfile.lock
9
+ InstalledFiles
10
+ _yardoc
11
+ coverage
12
+ doc/
13
+ lib/bundler/man
14
+ pkg
15
+ rdoc
16
+ spec/reports
17
+ spec/test_logs
18
+ test/tmp
19
+ test/version_tmp
20
+ tmp
21
+ logs
22
+ .DS_Store
23
+ *.tmproj
24
+ tmtags
25
+ .idea/*
data/.yardopts ADDED
@@ -0,0 +1,3 @@
1
+ --title 'Harvestdor-Indexer Gem Documentation'
2
+ lib/**/*.rb -
3
+ README.rdoc LICENSE.txt
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+ source "http://sul-gems.stanford.edu"
3
+
4
+ # See harvestdor-indexer.gemspec for this gem's dependencies
5
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,5 @@
1
+ Copyright (c) 20XX-2012. The Board of Trustees of the Leland Stanford Junior University. All rights reserved.
2
+
3
+ Redistribution and use of this distribution in source and binary forms, with or without modification, are permitted provided that: The above copyright notice and this permission notice appear in all copies and supporting documentation; The name, identifiers, and trademarks of The Board of Trustees of the Leland Stanford Junior University are not used in advertising or publicity without the express prior written permission of The Board of Trustees of the Leland Stanford Junior University; Recipients acknowledge that this distribution is made available as a research courtesy, "as is", potentially with defects, without any obligation on the part of The Board of Trustees of the Leland Stanford Junior University to provide support, services, or repair;
4
+
5
+ THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, WITH REGARD TO THIS SOFTWARE, INCLUDING WITHOUT LIMITATION ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, AND IN NO EVENT SHALL THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, TORT (INCLUDING NEGLIGENCE) OR STRICT LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,113 @@
1
+ = Harvestdor::Indexer
2
+
3
+ A Gem to harvest meta/data from DOR and the skeleton code to index it and write to Solr.
4
+
5
+ == Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ gem 'harvestdor-indexer'
10
+
11
+ And then execute:
12
+
13
+ $ bundle
14
+
15
+ Or install it yourself as:
16
+
17
+ $ gem install harvestdor-indexer
18
+
19
+ == Usage
20
+
21
+ You must override the index method and provide configuration options. It is recommended to write a script to run it, too - example below.
22
+
23
+ === Configuration / Set up
24
+
25
+ Create a yml config file for your collection going to a Solr index.
26
+
27
+ See spec/config/ap.yml for an example.
28
+
29
+ You will want to copy that file and change the following settings:
30
+ 1. log_name
31
+ 2. default_set (in OAI harvesting params section)
32
+ 2a. other OAI harvesting params
33
+ 3. blacklist or whitelist if you are using them
34
+
35
+ You can also pass in non-default configurations as a hash
36
+
37
+ indexer = Harvestdor::Indexer.new({:oai_repository_url => 'http://my_oai.org, :default_from_date => '2012-12-01'})
38
+
39
+ === Override the Harvestdor::Indexer.index method
40
+
41
+ In your code, override this method from the Harvestdor::Indexer class
42
+
43
+ # create Solr doc for the druid and add it to Solr, unless it is on the blacklist.
44
+ # NOTE: don't forget to send commit to Solr, either once at end (already in harvest_and_index), or for each add, or ...
45
+ def index druid
46
+ if blacklist.include?(druid)
47
+ logger.info("Druid #{druid} is on the blacklist and will have no Solr doc created")
48
+ else
49
+ logger.error("You must override the index method to transform druids into Solr docs and add them to Solr")
50
+
51
+ doc_hash = {}
52
+ doc_hash[:id] = druid
53
+ # doc_hash[:title_tsim] = smods_rec(druid).short_title
54
+
55
+ # you might add things from Indexer level class here
56
+ # (e.g. things that are the same across all documents in the harvest)
57
+
58
+ solr_client.add(doc_hash)
59
+
60
+ # logger.info("Just created Solr doc for #{druid}")
61
+ # TODO: provide call to code to update DOR object's workflow datastream??
62
+ end
63
+ end
64
+
65
+ === Run it
66
+
67
+ (bundle install)
68
+
69
+ I suggest you write a script to run the code. Your script might look like this:
70
+
71
+ #!/usr/bin/env ruby
72
+
73
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..'))
74
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
75
+
76
+ require 'rubygems'
77
+ begin
78
+ require 'your_indexer'
79
+ rescue LoadError
80
+ require 'bundler/setup'
81
+ require 'your_indexer'
82
+ end
83
+
84
+ config_yml_path = ARGV.pop
85
+ if config_yml_path.nil?
86
+ puts "** You must provide the full path to a config yml file **"
87
+ exit
88
+ end
89
+
90
+ indexer = Harvestdor::Indexer.new(config_yml_path, opts)
91
+ indexer.harvest_and_index
92
+
93
+ Then you run the script like so:
94
+
95
+ ./bin/indexer config/(your coll).yml
96
+
97
+ I suggest you run your code on harvestdor-dev, as it is already set up to be able to harvest from the DOR OAI provider
98
+
99
+
100
+ == Contributing
101
+
102
+ # Fork it
103
+ # Create your feature branch (`git checkout -b my-new-feature`)
104
+ # Write code and tests.
105
+ # Commit your changes (`git commit -am 'Added some feature'`)
106
+ # Push to the branch (`git push origin my-new-feature`)
107
+ # Create new Pull Request
108
+
109
+ == Releases
110
+
111
+ * <b>0.0.3</b> add methods for public_xml, content_metadata, identity_metadata ...
112
+ * <b>0.0.2</b> better model code for index method (thanks, Bess!)
113
+ * <b>0.0.1</b> initial commit
data/Rakefile ADDED
@@ -0,0 +1,56 @@
1
+ require "bundler/gem_tasks"
2
+
3
+ require 'rake'
4
+ require 'bundler'
5
+
6
+ require 'rspec/core/rake_task'
7
+ require 'yard'
8
+ require 'yard/rake/yardoc_task'
9
+
10
+ require 'dlss/rake/dlss_release'
11
+ Dlss::Release.new
12
+
13
+ begin
14
+ Bundler.setup(:default, :development)
15
+ rescue Bundler::BundlerError => e
16
+ $stderr.puts e.message
17
+ $stderr.puts "Run `bundle install` to install missing gems"
18
+ exit e.status_code
19
+ end
20
+
21
+ desc "DO NOT USE! use dlss_release"
22
+ task :release
23
+
24
+ task :default => :ci
25
+
26
+ desc "run continuous integration suite (tests, coverage, docs)"
27
+ task :ci => [:rspec, :doc]
28
+
29
+ task :spec => :rspec
30
+
31
+ desc "run specs EXCEPT integration specs"
32
+ RSpec::Core::RakeTask.new(:spec_fast) do |spec|
33
+ spec.rspec_opts = ["-c", "-f progress", "--tty", "-t ~integration", "-r ./spec/spec_helper.rb"]
34
+ end
35
+
36
+ RSpec::Core::RakeTask.new(:rspec) do |spec|
37
+ spec.rspec_opts = ["-c", "-f progress", "--tty", "-r ./spec/spec_helper.rb"]
38
+ end
39
+
40
+ # Use yard to build docs
41
+ begin
42
+ project_root = File.expand_path(File.dirname(__FILE__))
43
+ doc_dest_dir = File.join(project_root, 'doc')
44
+
45
+ YARD::Rake::YardocTask.new(:doc) do |yt|
46
+ yt.files = Dir.glob(File.join(project_root, 'lib', '**', '*.rb')) +
47
+ [ File.join(project_root, 'README.rdoc') ]
48
+ yt.options = ['--output-dir', doc_dest_dir, '--readme', 'README.rdoc', '--title', 'Harvestdor Gem Documentation']
49
+ end
50
+ rescue LoadError
51
+ desc "Generate YARD Documentation"
52
+ task :doc do
53
+ abort "Please install the YARD gem to generate rdoc."
54
+ end
55
+ end
56
+
@@ -0,0 +1,43 @@
1
+ # -*- encoding: utf-8 -*-
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'harvestdor-indexer/version'
5
+
6
+ Gem::Specification.new do |gem|
7
+ gem.name = "harvestdor-indexer"
8
+ gem.version = Harvestdor::Indexer::VERSION
9
+ gem.authors = ["Naomi Dushay"]
10
+ gem.email = ["ndushay@stanford.edu"]
11
+ gem.description = %q{Harvest DOR object metadata via a relationship (e.g. hydra:isGovernedBy rdf:resource="info:fedora/druid:hy787xj5878") and dates, plus code framework to write Solr docs to index}
12
+ gem.summary = %q{Harvest DOR object metadata and index it to Solr}
13
+ gem.homepage = "https://consul.stanford.edu/display/chimera/Chimera+project"
14
+
15
+ gem.files = `git ls-files`.split($/)
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^spec/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.add_dependency 'rsolr'
21
+
22
+ # sul-gems
23
+ gem.add_dependency 'harvestdor'
24
+ gem.add_dependency 'stanford-mods'
25
+
26
+ # Runtime dependencies
27
+ # gem.add_runtime_dependency 'nokogiri'
28
+
29
+ # Development dependencies
30
+ # Bundler will install these gems too if you've checked out solrmarc-wrapper source from git and run 'bundle install'
31
+ # It will not add these as dependencies if you require solrmarc-wrapper for other projects
32
+ gem.add_development_dependency "lyberteam-gems-devel", ">= 1.0"
33
+ gem.add_development_dependency "rake"
34
+ # docs
35
+ gem.add_development_dependency "rdoc"
36
+ gem.add_development_dependency "yard"
37
+ # tests
38
+ gem.add_development_dependency 'rspec'
39
+ gem.add_development_dependency 'simplecov'
40
+ gem.add_development_dependency 'simplecov-rcov'
41
+ # gem.add_development_dependency 'ruby-debug19'
42
+
43
+ end
@@ -0,0 +1,213 @@
1
+ # external gems
2
+ require 'confstruct'
3
+ require 'rsolr'
4
+
5
+ # sul-dlss gems
6
+ require 'harvestdor'
7
+ require 'stanford-mods'
8
+
9
+ # stdlib
10
+ require 'logger'
11
+
12
+ require "harvestdor-indexer/version"
13
+
14
+ module Harvestdor
15
+ # Base class to harvest from DOR via harvestdor gem and then index
16
+ class Indexer
17
+
18
+ def initialize yml_path, options = {}
19
+ @yml_path = yml_path
20
+ config.configure(YAML.load_file(yml_path)) if yml_path
21
+ config.configure options
22
+ yield(config) if block_given?
23
+ end
24
+
25
+ def config
26
+ @config ||= Confstruct::Configuration.new()
27
+ end
28
+
29
+ def logger
30
+ @logger ||= load_logger(config.log_dir, config.log_name)
31
+ end
32
+
33
+ # per this Indexer's config options
34
+ # harvest the druids via OAI
35
+ # create a Solr profiling document for each druid
36
+ # write the result to the Solr index
37
+ def harvest_and_index
38
+ if whitelist.empty?
39
+ druids.each { |druid| index druid }
40
+ else
41
+ whitelist.each { |druid| index druid }
42
+ end
43
+ solr_client.commit
44
+ logger.info("Finished processing: final Solr commit returned.")
45
+ end
46
+
47
+ # return Array of druids contained in the OAI harvest indicated by OAI params in yml configuration file
48
+ # @return [Array<String>] or enumeration over it, if block is given. (strings are druids, e.g. ab123cd1234)
49
+ def druids
50
+ @druids ||= harvestdor_client.druids_via_oai
51
+ end
52
+
53
+ # create Solr doc for the druid and add it to Solr, unless it is on the blacklist.
54
+ # NOTE: don't forget to send commit to Solr, either once at end (already in harvest_and_index), or for each add, or ...
55
+ def index druid
56
+ if blacklist.include?(druid)
57
+ logger.info("Druid #{druid} is on the blacklist and will have no Solr doc created")
58
+ else
59
+ logger.fatal("You must override the index method to transform druids into Solr docs and add them to Solr")
60
+
61
+ begin
62
+ #logger.debug "About to index #{druid}"
63
+ doc_hash = {}
64
+ doc_hash[:id] = druid
65
+ # doc_hash[:title_tsim] = smods_rec(druid).short_title
66
+
67
+ # you might add things from Indexer level class here
68
+ # (e.g. things that are the same across all documents in the harvest)
69
+
70
+ solr_client.add(doc_hash)
71
+
72
+ # logger.debug("Just created Solr doc for #{druid}")
73
+ # TODO: provide call to code to update DOR object's workflow datastream??
74
+ rescue => e
75
+ logger.error "Failed to index #{druid}: #{e.message}"
76
+ end
77
+ end
78
+ end
79
+
80
+ # return the MODS for the druid as a Stanford::Mods::Record object
81
+ # @param [String] druid e.g. ab123cd4567
82
+ # @return [Stanford::Mods::Record] created from the MODS xml for the druid
83
+ def smods_rec druid
84
+ ng_doc = harvestdor_client.mods druid
85
+ raise "Empty MODS metadata for #{druid}: #{ng_doc.to_xml}" if ng_doc.root.xpath('//text()').empty?
86
+ mods_rec = Stanford::Mods::Record.new
87
+ mods_rec.from_nk_node(ng_doc.root)
88
+ mods_rec
89
+ end
90
+
91
+ # the public xml for this DOR object, from the purl page
92
+ # @param [String] druid e.g. ab123cd4567
93
+ # @return [Nokogiri::XML::Document] the public xml for the DOR object
94
+ def public_xml druid
95
+ ng_doc = harvestdor_client.public_xml druid
96
+ raise "No public xml for #{druid}" if !ng_doc
97
+ raise "Empty public xml for #{druid}: #{ng_doc.to_xml}" if ng_doc.root.xpath('//text()').empty?
98
+ ng_doc
99
+ end
100
+
101
+ # the contentMetadata for this DOR object, from the purl public xml
102
+ # @param [String] druid e.g. ab123cd4567
103
+ # @return [Nokogiri::XML::Document] the contentMetadata for the DOR object
104
+ def content_metadata druid
105
+ ng_doc = harvestdor_client.content_metadata druid
106
+ raise "No contentMetadata for #{druid}" if !ng_doc || !ng_doc.root
107
+ ng_doc
108
+ end
109
+
110
+ # the identityMetadata for this DOR object, from the purl public xml
111
+ # @param [String] druid e.g. ab123cd4567
112
+ # @return [Nokogiri::XML::Document] the identityMetadata for the DOR object
113
+ def identity_metadata druid
114
+ ng_doc = harvestdor_client.identity_metadata druid
115
+ raise "No identityMetadata for #{druid}" if !ng_doc || !ng_doc.root
116
+ ng_doc
117
+ end
118
+
119
+ # the rightsMetadata for this DOR object, from the purl public xml
120
+ # @param [String] druid e.g. ab123cd4567
121
+ # @return [Nokogiri::XML::Document] the rightsMetadata for the DOR object
122
+ def rights_metadata druid
123
+ ng_doc = harvestdor_client.rights_metadata druid
124
+ raise "No rightsMetadata for #{druid}" if !ng_doc || !ng_doc.root
125
+ ng_doc
126
+ end
127
+
128
+ # the RDF for this DOR object, from the purl public xml
129
+ # @param [String] druid e.g. ab123cd4567
130
+ # @return [Nokogiri::XML::Document] the RDF for the DOR object
131
+ def rdf druid
132
+ ng_doc = harvestdor_client.rdf druid
133
+ raise "No RDF for #{druid}" if !ng_doc || !ng_doc.root
134
+ ng_doc
135
+ end
136
+
137
+ def solr_client
138
+ @solr_client ||= RSolr.connect(config.solr.to_hash)
139
+ end
140
+
141
+ # @return an Array of druids ('oo000oo0000') that should NOT be processed
142
+ def blacklist
143
+ # avoid trying to load the file multiple times
144
+ if !@blacklist && !@loaded_blacklist
145
+ @blacklist = load_blacklist(config.blacklist) if config.blacklist
146
+ end
147
+ @blacklist ||= []
148
+ end
149
+
150
+ # @return an Array of druids ('oo000oo0000') that should be processed
151
+ def whitelist
152
+ # avoid trying to load the file multiple times
153
+ if !@whitelist && !@loaded_whitelist
154
+ @whitelist = load_whitelist(config.whitelist) if config.whitelist
155
+ end
156
+ @whitelist ||= []
157
+ end
158
+
159
+ protected #---------------------------------------------------------------------
160
+
161
+ def harvestdor_client
162
+ @harvestdor_client ||= Harvestdor::Client.new({:config_yml_path => @yml_path})
163
+ end
164
+
165
+ # populate @blacklist as an Array of druids ('oo000oo0000') that will NOT be processed
166
+ # by reading the File at the indicated path
167
+ # @param [String] path - path of file containing a list of druids
168
+ def load_blacklist path
169
+ if path && !@loaded_blacklist
170
+ @loaded_blacklist = true
171
+ @blacklist = load_id_list path
172
+ end
173
+ end
174
+
175
+ # populate @blacklist as an Array of druids ('oo000oo0000') that WILL be processed
176
+ # (unless a druid is also on the blacklist)
177
+ # by reading the File at the indicated path
178
+ # @param [String] path - path of file containing a list of druids
179
+ def load_whitelist path
180
+ if path && !@loaded_whitelist
181
+ @loaded_whitelist = true
182
+ @whitelist = load_id_list path
183
+ end
184
+ end
185
+
186
+ # return an Array of druids ('oo000oo0000')
187
+ # populated by reading the File at the indicated path
188
+ # @param [String] path - path of file containing a list of druids
189
+ # @return [Array<String>] an Array of druids
190
+ def load_id_list path
191
+ if path
192
+ list = []
193
+ f = File.open(path).each_line { |line|
194
+ list << line.gsub(/\s+/, '') if !line.gsub(/\s+/, '').empty? && !line.strip.start_with?('#')
195
+ }
196
+ list
197
+ end
198
+ rescue
199
+ msg = "Unable to find list of druids at " + path
200
+ logger.fatal msg
201
+ raise msg
202
+ end
203
+
204
+ # Global, memoized, lazy initialized instance of a logger
205
+ # @param [String] log_dir directory for to get log file
206
+ # @param [String] log_name name of log file
207
+ def load_logger(log_dir, log_name)
208
+ Dir.mkdir(log_dir) unless File.directory?(log_dir)
209
+ @logger ||= Logger.new(File.join(log_dir, log_name), 'daily')
210
+ end
211
+
212
+ end # Indexer class
213
+ end # Harvestdor module
@@ -0,0 +1,6 @@
1
+ module Harvestdor
2
+ class Indexer
3
+ # this is the Ruby Gem version
4
+ VERSION = "0.0.3"
5
+ end
6
+ end
@@ -0,0 +1,61 @@
1
+ # You will want to copy this file and change the following settings:
2
+ # 1. log_dir, log_name
3
+ # 2. default_set (in OAI harvesting params section)
4
+ # 2a. other OAI harvesting params
5
+ # 3. blacklist or whitelist if you are using them
6
+ # 4. Solr baseurl
7
+
8
+ # log_dir: directory for log file (default logs, relative to harvestdor gem path)
9
+ log_dir: spec/test_logs
10
+
11
+ # log_name: name of log file (default: harvestdor.log)
12
+ log_name: ap-test.log
13
+
14
+ # purl: url for the DOR purl server (used to get ContentMetadata, etc.)
15
+ purl: http://purl.stanford.edu
16
+
17
+ # ---------- White and Black list parameters -----
18
+
19
+ # name of file containing druids that will NOT be processed even if they are harvested via OAI
20
+ # either give absolute path or path relative to where the command will be executed
21
+ #blacklist: config/ap_blacklist.txt
22
+
23
+ # name of file containing druids that WILL be processed (all others will be ignored)
24
+ # either give absolute path or path relative to where the command will be executed
25
+ #whitelist: config/ap_whitelist.txt
26
+
27
+ # ----------- SOLR index (that we're writing INTO) parameters ------------
28
+ solr:
29
+ url: https://sul-solr-test.stanford.edu/solr/mods_profiler
30
+ # url: http://localhost:8080/solr/mods_profiler
31
+ # timeouts are in seconds; read_timeout -> open/read, open_timeout -> connection open
32
+ read_timeout: 60
33
+ open_timeout: 60
34
+
35
+ # ---------- OAI harvesting parameters -----------
36
+
37
+ # oai_repository_url: URL of the OAI data provider
38
+ oai_repository_url: https://dor-oaiprovider-prod.stanford.edu/oai
39
+
40
+ # default_set: default set for harvest (default: nil)
41
+ # can be overridden on calls to harvest_ids and harvest_records
42
+ default_set: is_governed_by_hy787xj5878
43
+
44
+ # default_metadata_prefix: default metadata prefix to be used for harvesting (default: mods)
45
+ # can be overridden on calls to harvest_ids and harvest_records
46
+
47
+ # default_from_date: default from date for harvest (default: nil)
48
+ # can be overridden on calls to harvest_ids and harvest_records
49
+
50
+ # default_until_date: default until date for harvest (default: nil)
51
+ # can be overridden on calls to harvest_ids and harvest_records
52
+
53
+ # oai_client_debug: true for OAI::Client debug mode (default: false)
54
+
55
+ # Additional options to pass to Faraday http client (https://github.com/technoweenie/faraday)
56
+ http_options:
57
+ ssl:
58
+ verify: false
59
+ # timeouts are in seconds; timeout -> open/read, open_timeout -> connection open
60
+ timeout: 180
61
+ open_timeout: 180
@@ -0,0 +1,5 @@
1
+ # blacklist containing druids that should NOT be processed.
2
+ # druids should be in the form aa111bb2222
3
+
4
+ oo111oo1111
5
+ oo222oo2222
@@ -0,0 +1,5 @@
1
+ # whitelist containing the specific druids to be processed (all others will be ignored)
2
+ # druids should be in the form aa111bb2222
3
+
4
+ oo000oo0000
5
+ oo222oo2222
@@ -0,0 +1,21 @@
1
+ # for test coverage
2
+ require 'simplecov'
3
+ require 'simplecov-rcov'
4
+ class SimpleCov::Formatter::MergedFormatter
5
+ def format(result)
6
+ SimpleCov::Formatter::HTMLFormatter.new.format(result)
7
+ SimpleCov::Formatter::RcovFormatter.new.format(result)
8
+ end
9
+ end
10
+ SimpleCov.formatter = SimpleCov::Formatter::MergedFormatter
11
+ SimpleCov.start do
12
+ add_filter "/spec/"
13
+ end
14
+
15
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
16
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
17
+
18
+ require 'harvestdor-indexer'
19
+
20
+ #RSpec.configure do |config|
21
+ #end
@@ -0,0 +1,327 @@
1
+ require 'spec_helper'
2
+
3
+ describe Harvestdor::Indexer do
4
+
5
+ before(:all) do
6
+ @config_yml_path = File.join(File.dirname(__FILE__), "..", "config", "ap.yml")
7
+ @indexer = Harvestdor::Indexer.new(@config_yml_path)
8
+ require 'yaml'
9
+ @yaml = YAML.load_file(@config_yml_path)
10
+ @hdor_client = @indexer.send(:harvestdor_client)
11
+ @fake_druid = 'oo000oo0000'
12
+ @blacklist_path = File.join(File.dirname(__FILE__), "../config/ap_blacklist.txt")
13
+ @whitelist_path = File.join(File.dirname(__FILE__), "../config/ap_whitelist.txt")
14
+ end
15
+
16
+ describe "logging" do
17
+ it "should write the log file to the directory indicated by log_dir" do
18
+ @indexer.logger.info("indexer_spec logging test message")
19
+ File.exists?(File.join(@yaml['log_dir'], @yaml['log_name'])).should == true
20
+ end
21
+ end
22
+
23
+ it "should initialize the harvestdor_client from the config" do
24
+ @hdor_client.should be_an_instance_of(Harvestdor::Client)
25
+ @hdor_client.config.default_set.should == @yaml['default_set']
26
+ end
27
+
28
+ context "harvest_and_index" do
29
+ before(:all) do
30
+ @doc_hash = {
31
+ :id => @fake_druid
32
+ }
33
+ end
34
+ it "should call druids_via_oai and then call :add on rsolr connection" do
35
+ @hdor_client.should_receive(:druids_via_oai).and_return([@fake_druid])
36
+ @indexer.solr_client.should_receive(:add).with(@doc_hash)
37
+ @indexer.solr_client.should_receive(:commit)
38
+ @indexer.harvest_and_index
39
+ end
40
+ it "should not process druids in blacklist" do
41
+ indexer = Harvestdor::Indexer.new(@config_yml_path, {:blacklist => @blacklist_path})
42
+ hdor_client = indexer.send(:harvestdor_client)
43
+ hdor_client.should_receive(:druids_via_oai).and_return(['oo000oo0000', 'oo111oo1111', 'oo222oo2222', 'oo333oo3333'])
44
+ indexer.solr_client.should_receive(:add).with(hash_including({:id => 'oo000oo0000'}))
45
+ indexer.solr_client.should_not_receive(:add).with(hash_including({:id => 'oo111oo1111'}))
46
+ indexer.solr_client.should_not_receive(:add).with(hash_including({:id => 'oo222oo2222'}))
47
+ indexer.solr_client.should_receive(:add).with(hash_including({:id => 'oo333oo3333'}))
48
+ indexer.solr_client.should_receive(:commit)
49
+ indexer.harvest_and_index
50
+ end
51
+ it "should only process druids in whitelist if it exists" do
52
+ indexer = Harvestdor::Indexer.new(@config_yml_path, {:whitelist => @whitelist_path})
53
+ hdor_client = indexer.send(:harvestdor_client)
54
+ hdor_client.should_not_receive(:druids_via_oai)
55
+ indexer.solr_client.should_receive(:add).with(hash_including({:id => 'oo000oo0000'}))
56
+ indexer.solr_client.should_receive(:add).with(hash_including({:id => 'oo222oo2222'}))
57
+ indexer.solr_client.should_receive(:commit)
58
+ indexer.harvest_and_index
59
+ end
60
+ it "should not process druid if it is in both blacklist and whitelist" do
61
+ indexer = Harvestdor::Indexer.new(@config_yml_path, {:blacklist => @blacklist_path, :whitelist => @whitelist_path})
62
+ hdor_client = indexer.send(:harvestdor_client)
63
+ hdor_client.should_not_receive(:druids_via_oai)
64
+ indexer.solr_client.should_receive(:add).with(hash_including({:id => 'oo000oo0000'}))
65
+ indexer.solr_client.should_receive(:commit)
66
+ indexer.harvest_and_index
67
+ end
68
+ it "should only call :commit on rsolr connection once" do
69
+ indexer = Harvestdor::Indexer.new(@config_yml_path)
70
+ hdor_client = indexer.send(:harvestdor_client)
71
+ hdor_client.should_receive(:druids_via_oai).and_return(['1', '2', '3'])
72
+ indexer.solr_client.should_receive(:add).exactly(3).times
73
+ indexer.solr_client.should_receive(:commit).once
74
+ indexer.harvest_and_index
75
+ end
76
+ end
77
+
78
+ it "druids method should call druids_via_oai method on harvestdor_client" do
79
+ @hdor_client.should_receive(:druids_via_oai)
80
+ @indexer.druids
81
+ end
82
+
83
+ context "smods_rec method" do
84
+ before(:all) do
85
+ @fake_druid = 'oo000oo0000'
86
+ @ns_decl = "xmlns='#{Mods::MODS_NS}'"
87
+ @mods_xml = "<mods #{@ns_decl}><note>hi</note></mods>"
88
+ @ng_mods_xml = Nokogiri::XML(@mods_xml)
89
+ end
90
+ it "should call mods method on harvestdor_client" do
91
+ @hdor_client.should_receive(:mods).with(@fake_druid).and_return(@ng_mods_xml)
92
+ @indexer.smods_rec(@fake_druid)
93
+ end
94
+ it "should return Stanford::Mods::Record object" do
95
+ @hdor_client.should_receive(:mods).with(@fake_druid).and_return(@ng_mods_xml)
96
+ @indexer.smods_rec(@fake_druid).should be_an_instance_of(Stanford::Mods::Record)
97
+ end
98
+ it "should raise exception if MODS xml for the druid is empty" do
99
+ @hdor_client.stub(:mods).with(@fake_druid).and_return(Nokogiri::XML("<mods #{@ns_decl}/>"))
100
+ expect { @indexer.smods_rec(@fake_druid) }.to raise_error(RuntimeError, Regexp.new("^Empty MODS metadata for #{@fake_druid}: <"))
101
+ end
102
+ it "should raise exception if there is no MODS xml for the druid" do
103
+ expect { @indexer.smods_rec(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingMods)
104
+ end
105
+ end
106
+
107
+ context "public_xml related methods" do
108
+ before(:all) do
109
+ @id_md_xml = "<identityMetadata><objectId>druid:#{@fake_druid}</objectId></identityMetadata>"
110
+ @cntnt_md_xml = "<contentMetadata type='image' objectId='#{@fake_druid}'>foo</contentMetadata>"
111
+ @rights_md_xml = "<rightsMetadata><access type=\"discover\"><machine><world>bar</world></machine></access></rightsMetadata>"
112
+ @rdf_xml = "<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'><rdf:Description rdf:about=\"info:fedora/druid:#{@fake_druid}\">relationship!</rdf:Description></rdf:RDF>"
113
+ @pub_xml = "<publicObject id='druid:#{@fake_druid}'>#{@id_md_xml}#{@cntnt_md_xml}#{@rights_md_xml}#{@rdf_xml}</publicObject>"
114
+ @ng_pub_xml = Nokogiri::XML(@pub_xml)
115
+ end
116
+ context "#public_xml" do
117
+ it "should call public_xml method on harvestdor_client" do
118
+ @hdor_client.should_receive(:public_xml).with(@fake_druid).and_return(@ng_pub_xml)
119
+ @indexer.public_xml @fake_druid
120
+ end
121
+ it "retrieves entire public xml as a Nokogiri::XML::Document" do
122
+ @hdor_client.should_receive(:public_xml).with(@fake_druid).and_return(@ng_pub_xml)
123
+ px = @indexer.public_xml @fake_druid
124
+ px.should be_kind_of(Nokogiri::XML::Document)
125
+ px.root.name.should == 'publicObject'
126
+ px.root.attributes['id'].text.should == "druid:#{@fake_druid}"
127
+ end
128
+ it "raises exception if public xml for the druid is empty" do
129
+ @hdor_client.should_receive(:public_xml).with(@fake_druid).and_return(Nokogiri::XML("<publicObject/>"))
130
+ expect { @indexer.public_xml(@fake_druid) }.to raise_error(RuntimeError, Regexp.new("^Empty public xml for #{@fake_druid}: <"))
131
+ end
132
+ it "raises Harvestdor::Errors::MissingPurlPage if there is no purl page for the druid" do
133
+ expect { @indexer.public_xml(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingPurlPage)
134
+ end
135
+ it "raises error if there is no public_xml page for the druid" do
136
+ @hdor_client.should_receive(:public_xml).with(@fake_druid).and_return(nil)
137
+ expect { @indexer.public_xml(@fake_druid) }.to raise_error(RuntimeError, "No public xml for #{@fake_druid}")
138
+ end
139
+ end
140
+ context "#content_metadata" do
141
+ it "returns a Nokogiri::XML::Document derived from the public xml" do
142
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(@ng_pub_xml)
143
+ cm = @indexer.content_metadata(@fake_druid)
144
+ cm.should be_kind_of(Nokogiri::XML::Document)
145
+ cm.root.should_not == nil
146
+ cm.root.name.should == 'contentMetadata'
147
+ cm.root.attributes['objectId'].text.should == @fake_druid
148
+ cm.root.text.strip.should == 'foo'
149
+ end
150
+ it "raises Harvestdor::Errors::MissingPurlPage if there is no purl page for the druid" do
151
+ expect { @indexer.content_metadata(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingPurlPage)
152
+ end
153
+ it "should raise exception if there is no contentMetadata in the public xml" do
154
+ pub_xml = "<publicObject id='druid:#{@fake_druid}'>#{@id_md_xml}</publicObject>"
155
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(Nokogiri::XML(pub_xml))
156
+ expect { @indexer.content_metadata(@fake_druid) }.to raise_error(RuntimeError, "No contentMetadata for #{@fake_druid}")
157
+ end
158
+ it "raises RuntimeError if nil is returned by Harvestdor::Client.contentMetadata for the druid" do
159
+ @hdor_client.should_receive(:content_metadata).with(@fake_druid).and_return(nil)
160
+ expect { @indexer.content_metadata(@fake_druid) }.to raise_error(RuntimeError, "No contentMetadata for #{@fake_druid}")
161
+ end
162
+ it "raises MissingContentMetadata error if there is no contentMetadata in the public_xml for the druid" do
163
+ URI::HTTP.any_instance.should_receive(:open)
164
+ expect { @indexer.content_metadata(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingContentMetadata)
165
+ end
166
+ end
167
+ context "#identity_metadata" do
168
+ it "returns a Nokogiri::XML::Document derived from the public xml" do
169
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(@ng_pub_xml)
170
+ im = @indexer.identity_metadata(@fake_druid)
171
+ im.should be_kind_of(Nokogiri::XML::Document)
172
+ im.root.should_not == nil
173
+ im.root.name.should == 'identityMetadata'
174
+ im.root.text.strip.should == "druid:#{@fake_druid}"
175
+ end
176
+ it "raises Harvestdor::Errors::MissingPurlPage if there is no purl page for the druid" do
177
+ expect { @indexer.identity_metadata(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingPurlPage)
178
+ end
179
+ it "should raise exception if there is no identityMetadata in the public xml" do
180
+ pub_xml = "<publicObject id='druid:#{@fake_druid}'>#{@cntnt_md_xml}</publicObject>"
181
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(Nokogiri::XML(pub_xml))
182
+ expect { @indexer.identity_metadata(@fake_druid) }.to raise_error(RuntimeError, "No identityMetadata for #{@fake_druid}")
183
+ end
184
+ it "raises RuntimeError if nil is returned by Harvestdor::Client.identityMetadata for the druid" do
185
+ @hdor_client.should_receive(:identity_metadata).with(@fake_druid).and_return(nil)
186
+ expect { @indexer.identity_metadata(@fake_druid) }.to raise_error(RuntimeError, "No identityMetadata for #{@fake_druid}")
187
+ end
188
+ it "raises MissingIdentityMetadata error if there is no identityMetadata in the public_xml for the druid" do
189
+ URI::HTTP.any_instance.should_receive(:open)
190
+ expect { @indexer.identity_metadata(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingIdentityMetadata)
191
+ end
192
+ end
193
+ context "#rights_metadata" do
194
+ it "returns a Nokogiri::XML::Document derived from the public xml" do
195
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(@ng_pub_xml)
196
+ im = @indexer.rights_metadata(@fake_druid)
197
+ im.should be_kind_of(Nokogiri::XML::Document)
198
+ im.root.should_not == nil
199
+ im.root.name.should == 'rightsMetadata'
200
+ im.root.text.strip.should == "bar"
201
+ end
202
+ it "raises Harvestdor::Errors::MissingPurlPage if there is no purl page for the druid" do
203
+ expect { @indexer.rights_metadata(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingPurlPage)
204
+ end
205
+ it "should raise exception if there is no rightsMetadata in the public xml" do
206
+ pub_xml = "<publicObject id='druid:#{@fake_druid}'>#{@cntnt_md_xml}</publicObject>"
207
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(Nokogiri::XML(pub_xml))
208
+ expect { @indexer.rights_metadata(@fake_druid) }.to raise_error(RuntimeError, "No rightsMetadata for #{@fake_druid}")
209
+ end
210
+ it "raises RuntimeError if nil is returned by Harvestdor::Client.rightsMetadata for the druid" do
211
+ @hdor_client.should_receive(:rights_metadata).with(@fake_druid).and_return(nil)
212
+ expect { @indexer.rights_metadata(@fake_druid) }.to raise_error(RuntimeError, "No rightsMetadata for #{@fake_druid}")
213
+ end
214
+ it "raises MissingRightsMetadata error if there is no rightsMetadata in the public_xml for the druid" do
215
+ URI::HTTP.any_instance.should_receive(:open)
216
+ expect { @indexer.rights_metadata(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingRightsMetadata)
217
+ end
218
+ end
219
+ context "#rdf" do
220
+ it "returns a Nokogiri::XML::Document derived from the public xml" do
221
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(@ng_pub_xml)
222
+ im = @indexer.rdf(@fake_druid)
223
+ im.should be_kind_of(Nokogiri::XML::Document)
224
+ im.root.should_not == nil
225
+ im.root.name.should == 'RDF'
226
+ im.root.text.strip.should == "relationship!"
227
+ end
228
+ it "raises Harvestdor::Errors::MissingPurlPage if there is no purl page for the druid" do
229
+ expect { @indexer.rdf(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingPurlPage)
230
+ end
231
+ it "should raise exception if there is no rdf in the public xml" do
232
+ pub_xml = "<publicObject id='druid:#{@fake_druid}'>#{@cntnt_md_xml}</publicObject>"
233
+ Harvestdor.stub(:public_xml).with(@fake_druid, @indexer.config.purl).and_return(Nokogiri::XML(pub_xml))
234
+ expect { @indexer.rdf(@fake_druid) }.to raise_error(RuntimeError, "No RDF for #{@fake_druid}")
235
+ end
236
+ it "raises RuntimeError if nil is returned by Harvestdor::Client.rdf for the druid" do
237
+ @hdor_client.should_receive(:rdf).with(@fake_druid).and_return(nil)
238
+ expect { @indexer.rdf(@fake_druid) }.to raise_error(RuntimeError, "No RDF for #{@fake_druid}")
239
+ end
240
+ it "raises MissingRDF error if there is no rdf in the public_xml for the druid" do
241
+ URI::HTTP.any_instance.should_receive(:open)
242
+ expect { @indexer.rdf(@fake_druid) }.to raise_error(Harvestdor::Errors::MissingRDF)
243
+ end
244
+ end
245
+ end
246
+
247
+ context "blacklist" do
248
+ it "should be an Array with an entry for each non-empty line in the file" do
249
+ @indexer.send(:load_blacklist, @blacklist_path)
250
+ @indexer.send(:blacklist).should be_an_instance_of(Array)
251
+ @indexer.send(:blacklist).size.should == 2
252
+ end
253
+ it "should be empty Array if there was no blacklist config setting" do
254
+ indexer = Harvestdor::Indexer.new(@config_yml_path)
255
+ indexer.send(:blacklist).should == []
256
+ end
257
+ context "load_blacklist" do
258
+ it "should not be called if there was no blacklist config setting" do
259
+ indexer = Harvestdor::Indexer.new(@config_yml_path)
260
+
261
+ indexer.should_not_receive(:load_blacklist)
262
+
263
+ hdor_client = indexer.send(:harvestdor_client)
264
+ hdor_client.should_receive(:druids_via_oai).and_return([@fake_druid])
265
+ indexer.solr_client.should_receive(:add)
266
+ indexer.solr_client.should_receive(:commit)
267
+ indexer.harvest_and_index
268
+ end
269
+ it "should only try to load a blacklist once" do
270
+ indexer = Harvestdor::Indexer.new(@config_yml_path, {:blacklist => @blacklist_path})
271
+ indexer.send(:blacklist)
272
+ File.any_instance.should_not_receive(:open)
273
+ indexer.send(:blacklist)
274
+ end
275
+ it "should log an error message and throw RuntimeError if it can't find the indicated blacklist file" do
276
+ exp_msg = 'Unable to find list of druids at bad_path'
277
+ indexer = Harvestdor::Indexer.new(@config_yml_path, {:blacklist => 'bad_path'})
278
+ indexer.logger.should_receive(:fatal).with(exp_msg)
279
+ expect { indexer.send(:load_blacklist, 'bad_path') }.to raise_error(exp_msg)
280
+ end
281
+ end
282
+ end # blacklist
283
+
284
+ context "whitelist" do
285
+ it "should be an Array with an entry for each non-empty line in the file" do
286
+ @indexer.send(:load_whitelist, @whitelist_path)
287
+ @indexer.send(:whitelist).should be_an_instance_of(Array)
288
+ @indexer.send(:whitelist).size.should == 2
289
+ end
290
+ it "should be empty Array if there was no whitelist config setting" do
291
+ indexer = Harvestdor::Indexer.new(@config_yml_path)
292
+ indexer.send(:whitelist).should == []
293
+ end
294
+ context "load_whitelist" do
295
+ it "should not be called if there was no whitelist config setting" do
296
+ indexer = Harvestdor::Indexer.new(@config_yml_path)
297
+
298
+ indexer.should_not_receive(:load_whitelist)
299
+
300
+ hdor_client = indexer.send(:harvestdor_client)
301
+ hdor_client.should_receive(:druids_via_oai).and_return([@fake_druid])
302
+ indexer.solr_client.should_receive(:add)
303
+ indexer.solr_client.should_receive(:commit)
304
+ indexer.harvest_and_index
305
+ end
306
+ it "should only try to load a whitelist once" do
307
+ indexer = Harvestdor::Indexer.new(@config_yml_path, {:whitelist => @whitelist_path})
308
+ indexer.send(:whitelist)
309
+ File.any_instance.should_not_receive(:open)
310
+ indexer.send(:whitelist)
311
+ end
312
+ it "should log an error message and throw RuntimeError if it can't find the indicated whitelist file" do
313
+ exp_msg = 'Unable to find list of druids at bad_path'
314
+ indexer = Harvestdor::Indexer.new(@config_yml_path, {:whitelist => 'bad_path'})
315
+ indexer.logger.should_receive(:fatal).with(exp_msg)
316
+ expect { indexer.send(:load_whitelist, 'bad_path') }.to raise_error(exp_msg)
317
+ end
318
+ end
319
+ end # whitelist
320
+
321
+ it "solr_client should initialize the rsolr client using the options from the config" do
322
+ indexer = Harvestdor::Indexer.new(nil, Confstruct::Configuration.new(:solr => { :url => 'http://localhost:2345', :a => 1 }) )
323
+ RSolr.should_receive(:connect).with(hash_including(:a => 1, :url => 'http://localhost:2345')).and_return('foo')
324
+ indexer.solr_client
325
+ end
326
+
327
+ end
metadata ADDED
@@ -0,0 +1,233 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: harvestdor-indexer
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.3
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Naomi Dushay
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-03-08 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rsolr
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: harvestdor
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: '0'
38
+ type: :runtime
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ - !ruby/object:Gem::Dependency
47
+ name: stanford-mods
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ - !ruby/object:Gem::Dependency
63
+ name: lyberteam-gems-devel
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: '1.0'
70
+ type: :development
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '1.0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: rake
80
+ requirement: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ! '>='
84
+ - !ruby/object:Gem::Version
85
+ version: '0'
86
+ type: :development
87
+ prerelease: false
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ! '>='
92
+ - !ruby/object:Gem::Version
93
+ version: '0'
94
+ - !ruby/object:Gem::Dependency
95
+ name: rdoc
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ! '>='
100
+ - !ruby/object:Gem::Version
101
+ version: '0'
102
+ type: :development
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ! '>='
108
+ - !ruby/object:Gem::Version
109
+ version: '0'
110
+ - !ruby/object:Gem::Dependency
111
+ name: yard
112
+ requirement: !ruby/object:Gem::Requirement
113
+ none: false
114
+ requirements:
115
+ - - ! '>='
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ none: false
122
+ requirements:
123
+ - - ! '>='
124
+ - !ruby/object:Gem::Version
125
+ version: '0'
126
+ - !ruby/object:Gem::Dependency
127
+ name: rspec
128
+ requirement: !ruby/object:Gem::Requirement
129
+ none: false
130
+ requirements:
131
+ - - ! '>='
132
+ - !ruby/object:Gem::Version
133
+ version: '0'
134
+ type: :development
135
+ prerelease: false
136
+ version_requirements: !ruby/object:Gem::Requirement
137
+ none: false
138
+ requirements:
139
+ - - ! '>='
140
+ - !ruby/object:Gem::Version
141
+ version: '0'
142
+ - !ruby/object:Gem::Dependency
143
+ name: simplecov
144
+ requirement: !ruby/object:Gem::Requirement
145
+ none: false
146
+ requirements:
147
+ - - ! '>='
148
+ - !ruby/object:Gem::Version
149
+ version: '0'
150
+ type: :development
151
+ prerelease: false
152
+ version_requirements: !ruby/object:Gem::Requirement
153
+ none: false
154
+ requirements:
155
+ - - ! '>='
156
+ - !ruby/object:Gem::Version
157
+ version: '0'
158
+ - !ruby/object:Gem::Dependency
159
+ name: simplecov-rcov
160
+ requirement: !ruby/object:Gem::Requirement
161
+ none: false
162
+ requirements:
163
+ - - ! '>='
164
+ - !ruby/object:Gem::Version
165
+ version: '0'
166
+ type: :development
167
+ prerelease: false
168
+ version_requirements: !ruby/object:Gem::Requirement
169
+ none: false
170
+ requirements:
171
+ - - ! '>='
172
+ - !ruby/object:Gem::Version
173
+ version: '0'
174
+ description: Harvest DOR object metadata via a relationship (e.g. hydra:isGovernedBy
175
+ rdf:resource="info:fedora/druid:hy787xj5878") and dates, plus code framework to
176
+ write Solr docs to index
177
+ email:
178
+ - ndushay@stanford.edu
179
+ executables: []
180
+ extensions: []
181
+ extra_rdoc_files: []
182
+ files:
183
+ - .gitignore
184
+ - .yardopts
185
+ - Gemfile
186
+ - LICENSE.txt
187
+ - README.rdoc
188
+ - Rakefile
189
+ - harvestdor-indexer.gemspec
190
+ - lib/harvestdor-indexer.rb
191
+ - lib/harvestdor-indexer/version.rb
192
+ - spec/config/ap.yml
193
+ - spec/config/ap_blacklist.txt
194
+ - spec/config/ap_whitelist.txt
195
+ - spec/spec_helper.rb
196
+ - spec/unit/harvestdor-indexer_spec.rb
197
+ homepage: https://consul.stanford.edu/display/chimera/Chimera+project
198
+ licenses: []
199
+ post_install_message:
200
+ rdoc_options: []
201
+ require_paths:
202
+ - lib
203
+ required_ruby_version: !ruby/object:Gem::Requirement
204
+ none: false
205
+ requirements:
206
+ - - ! '>='
207
+ - !ruby/object:Gem::Version
208
+ version: '0'
209
+ segments:
210
+ - 0
211
+ hash: -2920299245033359379
212
+ required_rubygems_version: !ruby/object:Gem::Requirement
213
+ none: false
214
+ requirements:
215
+ - - ! '>='
216
+ - !ruby/object:Gem::Version
217
+ version: '0'
218
+ segments:
219
+ - 0
220
+ hash: -2920299245033359379
221
+ requirements: []
222
+ rubyforge_project:
223
+ rubygems_version: 1.8.24
224
+ signing_key:
225
+ specification_version: 3
226
+ summary: Harvest DOR object metadata and index it to Solr
227
+ test_files:
228
+ - spec/config/ap.yml
229
+ - spec/config/ap_blacklist.txt
230
+ - spec/config/ap_whitelist.txt
231
+ - spec/spec_helper.rb
232
+ - spec/unit/harvestdor-indexer_spec.rb
233
+ has_rdoc: