lyber-core 1.3.0 → 3.2.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc DELETED
@@ -1,76 +0,0 @@
1
- = lyber_core
2
-
3
- Require the following:
4
- require 'lyber_core'
5
-
6
- Constants that need to be defined with sample values:
7
-
8
- If using WorkflowService:
9
- Dor::WF_URI = 'http://lyberservices-dev.stanford.edu/workflow'
10
- Dor::CREATE_WORKFLOW = true
11
-
12
- If using SuriService:
13
- Dor::MINT_SURI_IDS = true
14
- Dor::SURI_URL = 'http://some.suri.host:8080'
15
- Dor::ID_NAMESPACE = 'druid'
16
- Dor::SURI_USER = 'suriuser'
17
- Dor::SURI_PASSWORD = 'suripword'
18
-
19
- If connecting to https servers:
20
- LyberCore::CERT_FILE = File.dirname(__FILE__) + '/../certs/dummy.crt'
21
- LyberCore::KEY_FILE = File.dirname(__FILE__) + '/../certs/dummy.key'
22
- LyberCore::KEY_PASS = 'dummy'
23
-
24
- == lyber_core/utils
25
-
26
- If using Utilities, require the following:
27
- require 'lyber_core/utils'
28
-
29
- This will give you:
30
- LyberCore::Utils::BagitBag
31
- LyberCore::Utils::ChecksumValidate
32
- LyberCore::Utils::FileUtilities
33
-
34
- If you do not want all 3, you can require the individual classes. I.E. if you only want the bagit utils, then require:
35
- require 'lyber_core/utils/bagit_bat'
36
-
37
- The BagitBag class requires the bagit gem
38
- http://github.com/flazz/bagit
39
-
40
- == Build and release procedure
41
- Modify the version number in lyber-core.gemspec, then push your commits to AFS. DO NOT TAG!
42
- Run: 'rake dlss_release' to tag, build, and publish the lyber-core gem
43
- See the Rakefile and the LyberCore::DlssRelease task in lib/lyber_core/rake/dlss_release.rb for more details
44
-
45
- == Releases
46
- - <b>1.3</b> Started to use Dor::Config for workspace configuration
47
- - <b>1.2.1</b> Clean up logging of exceptions in LyberCore::Log
48
- - <b>1.2</b> Robots can now run as daemons via the LyberCore::Robots::ServiceController
49
- - <b>1.1.2</b> Can pass an array of "command line" arguments to the Robot constructor
50
- - <b>1.1.1</b> Robot#start now returns LyberCore::Robots::CONTINUE if it did work without error, LyberCore::Robots::SLEEP if it did no work,
51
- and LyberCore::Robots::HALT if it reached its error limit while working on its queue
52
- - <b>1.1.0</b> Allow Robots::WorkQueue to resolve an arbitrary number of prerequisites
53
- - <b>1.0.0</b> Factored all Dor::* classes and object models out of lyber-core and into a separate dor-services gem. WARNING: MAY BREAK COMPATIBILITY WITH PREVIOUS DOR-ENABLED CODE.
54
- - <b>0.9.8</b> Created branch for legacy work "0.9-legacy". Robots can now be configured with fully qualified workflows for prerequisites
55
- eg <i>dor:googleScannedBookWF:register-object</i>
56
- - <b>0.9.7.4</b> Untangled a couple development dependencies; fixed issue where "include REXML" was polluting the Object namespace
57
- - <b>0.9.7.3</b> Logging enhancements
58
- - <b>0.9.7.2</b> IdentityMetadata bugfixes
59
- - <b>0.9.7.1</b> Enhanced exception handling
60
- - <b>0.9.7</b> ActiveMQ message-based robot parallelization as described here: https://consul.stanford.edu/x/tQjdBw . Removal of ROXML models.
61
- - <b>0.9.6.3</b> Better error reporting for LyberCore::Utils::FileUtilities.execute, which means when a system command fails we have a better idea of why.
62
- - <b>0.9.6.2</b> Handles new response from workflow service when there are no objects in the queue: <objects count="0"\>
63
- - <b>0.9.6</b> DorService.get_objects_for_workstep can handle one or two completed steps. Trimmed-down gem dependencies now defined in lyber-core.gemspec. 'rake dlss_release' will tag, build and publish gem
64
- - <b>0.9.5.5</b> Robots now log to ROBOT_ROOT/log/robot_name.log unless specified in constructor
65
- - <b>0.9.5.4</b> Custom exception classes, more checking of error conditions
66
- - <b>0.9.5.3</b> More robust testing, minor bug fixes, compatible with active_fedora 1.2.6
67
- - <b>0.9.5</b> Significantly refactored to provide central logging and many more debugging statements.
68
- - <b>0.9.4</b> First version that requires Ruby 1.8.7. Built with bundler and rvm
69
- - <b>0.9.3.9</b> Last version compatible with Ruby 1.8.6. Stored in source control as the 'facets-282' branch.
70
- - <b>0.9.3</b> Compatibility with bagit 1.0.0. Bump to active-fedora 1.1.13
71
- - <b>0.9.2</b> Workflow bug fixes. Last version that supports active-fedora 1.0.7
72
- - We recommend that you <b>DO NOT USE</b> any version older than these
73
-
74
- == Copyright
75
-
76
- Copyright (c) 2010 Stanford University Library. See LICENSE for details.
data/lib/dlss_service.rb DELETED
@@ -1,81 +0,0 @@
1
- require 'net/http'
2
- require 'net/https'
3
- require 'uri'
4
- require 'cgi'
5
- require 'active_fedora'
6
- require 'lyber_core'
7
- require 'nokogiri'
8
-
9
- class DlssService
10
-
11
- # the fedora object to operate on
12
- attr_reader :fedora_url
13
-
14
- def initialize(fedora_url)
15
- @fedora_url = fedora_url
16
- solr_url = "http://localhost:8983/solr"
17
- Fedora::Repository.register(@fedora_url)
18
- ActiveFedora::SolrService.register(solr_url)
19
- end
20
-
21
- # Get an https connection to the given url
22
- def get_https_connection(url)
23
- https = Net::HTTP.new(url.host, url.port)
24
- if(url.scheme == 'https')
25
- https.use_ssl = true
26
- https.cert = OpenSSL::X509::Certificate.new( File.read(CERT_FILE) )
27
- https.key = OpenSSL::PKey::RSA.new( File.read(KEY_FILE), KEY_PASS )
28
- https.verify_mode = OpenSSL::SSL::VERIFY_NONE
29
- end
30
- https
31
- end
32
-
33
- # Retrieve the metadata of a datastream of a DOR object
34
- # e.g. FEDORA_URI + /objects/ + druid + /datastreams/dor gets "dor" datastream metadata
35
- def get_datastream_md(druid, ds_id)
36
- begin
37
- url = URI.parse(@fedora_url + '/objects/' + druid + '/datastreams/' + ds_id)
38
- req = Net::HTTP::Get.new(url.request_uri)
39
- req.basic_auth FEDORA_USER, FEDORA_PASS
40
- res = DorService.get_https_connection(url).start {|http| http.request(req) }
41
- case res
42
- when Net::HTTPSuccess
43
- return res.body
44
- else
45
- LyberCore::Log.error("Datastream " + ds_id + " not found for " + druid)
46
- return nil
47
- end
48
- end
49
- end
50
-
51
- # This is maintained for backward compatibility, but
52
- # get_all_druids_from_object_list or get_some_druids_from_object_list
53
- # are preferred.
54
- def DlssService.get_druids_from_object_list(objectListXml)
55
- DlssService.get_all_druids_from_object_list(objectListXml)
56
- end
57
-
58
- # Transforms the XML from getObjectsForWorkStep into a list of druids
59
- def DlssService.get_all_druids_from_object_list(objectListXml)
60
- DlssService.get_some_druids_from_object_list(objectListXml, nil)
61
- end
62
-
63
- # Takes XML of the form
64
- # <objects><object id='druid:hx066mp6063' url='https://lyberservices-test.stanford.edu/workflow/objects/druid:hx066mp6063'/></objects>
65
- # if count is an integer, return at most that number of druids
66
- # otherwise, return all druids in the queue
67
- def DlssService.get_some_druids_from_object_list(objectListXml, count)
68
- druids = []
69
-
70
- # parse the xml into a document object
71
- xmldoc = Nokogiri::XML::Reader(objectListXml)
72
-
73
- xmldoc.each do |node|
74
- druids << node.attribute("id") unless node.attribute("id").nil?
75
- break if druids.length == count
76
- end
77
- return druids
78
- end
79
-
80
-
81
- end
data/lib/dor_service.rb DELETED
@@ -1,588 +0,0 @@
1
- require 'net/http'
2
- require 'net/https'
3
- require 'uri'
4
- require 'cgi'
5
- require 'rexml/document'
6
-
7
- class DorService
8
-
9
- include REXML
10
-
11
- def DorService.get_https_connection(url)
12
- https = Net::HTTP.new(url.host, url.port)
13
- if(url.scheme == 'https')
14
- https.use_ssl = true
15
- https.cert = OpenSSL::X509::Certificate.new( File.read(CERT_FILE) )
16
- https.key = OpenSSL::PKey::RSA.new( File.read(KEY_FILE), KEY_PASS )
17
- https.verify_mode = OpenSSL::SSL::VERIFY_NONE
18
- end
19
- https
20
- end
21
-
22
- # This should check to see if an object with the given PID already
23
- # exists in the repository
24
- def DorService.create_object(form_data)
25
- begin
26
- url = DOR_URI + '/objects'
27
- body = DorService.encodeParams(form_data)
28
- content_type = 'application/x-www-form-urlencoded'
29
- res = LyberCore::Connection.post(url, body, :content_type => content_type)
30
- res =~ /\/objects\/(.*)/
31
- druid = $1
32
- return druid
33
- rescue Exception => e
34
- LyberCore::Log.debug("Unable to create object #{e.backtrace}")
35
- raise e
36
- end
37
- end
38
-
39
- #objects/dr:123/resources
40
- #parms: model, id
41
- #will create object of type dor:GoogleScannedPage
42
- def DorService.create_child_object(parent_druid, child_id)
43
- begin
44
- #See if page exists before creating new fedora object
45
- # raise "Object exists with id: " + child_id if(DorService.get_druid_by_id(child_id))
46
- form_data = {'model' => 'dor:googleScannedPage', 'id' => child_id}
47
- url = DOR_URI + '/objects/' + parent_druid + '/resources'
48
- body = DorService.encodeParams(form_data)
49
- content_type = 'application/x-www-form-urlencoded'
50
- res = LyberCore::Connection.post(url, body, :content_type => content_type)
51
- res=~ /\/resources\/(.*)/
52
- druid = $1
53
- LyberCore::Log.info("Child googleScannedPage object created for parent #{parent_druid}")
54
- LyberCore::Log.debug("child_id = #{child_id}")
55
- LyberCore::Log.debug("new druid = #{druid}")
56
- return druid
57
- rescue Exception => e
58
- LyberCore::Log.debug("Unable to create object")
59
- raise e, "Unable to create object "
60
- end
61
- end
62
-
63
-
64
- # Takes a hash of arrays and builds a x-www-form-urlencoded string for POSTing form parameters
65
- #
66
- # == Parameters
67
- # - <b>form_data</b> - a hash of arrays that contains the form data, ie. {'param1' => ['val1', 'val2'], 'param2' => ['val3']}
68
- def DorService.encodeParams(form_data)
69
- body = ""
70
- form_data.each_pair do |param, array|
71
- array.each do |value|
72
- encoded = CGI.escape value
73
- body += '&' unless (body == "")
74
- body += param + '=' + encoded
75
- end
76
- end
77
- body
78
- end
79
-
80
-
81
- # Depricated. Use Dor::WorkflowService#create_workflow in lyber_core gem
82
- # def DorService.create_workflow(workflow, druid)
83
- # begin
84
- # url = URI.parse(DOR_URI + '/objects/' + druid + '/workflows/' + workflow.workflow_id)
85
- # req = Net::HTTP::Put.new(url.path)
86
- # #req.basic_auth 'fedoraUser', 'pass'
87
- # req.body = workflow.workflow_process_xml
88
- # req.content_type = 'application/xml'
89
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
90
- #
91
- # WorkflowService.create_workflow()
92
- #
93
- # case res
94
- # when Net::HTTPSuccess
95
- # puts workflow.workflow_id + " created for " + druid
96
- # else
97
- # $stderr.print res.body
98
- # raise res.error!
99
- # end
100
- # rescue Exception => e
101
- # $stderr.print "Unable to create workflow " + e
102
- # raise
103
- # end
104
- # end
105
-
106
-
107
- # See if an object exists with this dor_id (not druid, but sub-identifier)
108
- # Caller will have to handle any exception thrown
109
- def DorService.get_druid_by_id(dor_id)
110
- url_string = "#{DOR_URI}/query_by_id?id=#{dor_id}"
111
- LyberCore::Log.debug("Fetching druid for dor_id #{dor_id} at url #{url_string}")
112
- url = URI.parse(url_string)
113
- req = Net::HTTP::Get.new(url.request_uri)
114
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
115
- res = LyberCore::Connection.send_request(url,req)
116
-
117
- case res
118
- when Net::HTTPSuccess
119
- res.body =~ /druid="([^"\r\n]*)"/
120
- return $1
121
- when Net::HTTPClientError
122
- LyberCore::Log.debug("Barcode does not yet exist in DOR: #{dor_id}")
123
- return nil
124
- when Net::HTTPServerError
125
- LyberCore::Log.debug("Encountered HTTPServerError error when requesting #{url}: #{res.inspect}")
126
- raise "Encountered 500 error when requesting #{url}: #{res.inspect}"
127
- else
128
- LyberCore::Log.debug("Encountered unknown error when requesting #{url}: #{res.inspect}")
129
- raise "Encountered unknown error when requesting #{url}: #{res.inspect}"
130
- end
131
- end
132
-
133
- ############################################# Start of Datastream methods
134
- # Until ActiveFedora supports client-side certificate configuration, we are stuck with our own methods to access datastreams
135
-
136
- #/objects/pid/datastreams/dsID ? [controlGroup] [dsLocation] [altIDs] [dsLabel] [versionable] [dsState] [formatURI] [checksumType] [checksum] [logMessage]
137
- def DorService.add_datastream(druid, ds_id, ds_label, xml, content_type='application/xml', versionable = false )
138
- DorService.add_datastream_managed(druid, ds_id, ds_label, xml, content_type, versionable)
139
- end
140
-
141
- def DorService.add_datastream_external_url(druid, ds_id, ds_label, ext_url, content_type, versionable = false)
142
- parms = '?controlGroup=E'
143
- parms += '&dsLabel=' + CGI.escape(ds_label)
144
- parms += '&versionable=false' unless(versionable)
145
- parms += '&dsLocation=' + ext_url
146
- DorService.set_datastream(druid, ds_id, parms, :post, {:type => content_type})
147
- end
148
-
149
- def DorService.update_datastream(druid, ds_id, xml, content_type='application/xml', versionable = false)
150
- parms = '?controlGroup=M'
151
- parms += '&versionable=false' unless(versionable)
152
- DorService.set_datastream(druid, ds_id, parms, :put, {:type => content_type, :xml => xml})
153
- end
154
-
155
- def DorService.add_datastream_managed(druid, ds_id, ds_label, xml, content_type='application/xml', versionable = false )
156
- parms = '?controlGroup=M'
157
- parms += '&dsLabel=' + CGI.escape(ds_label)
158
- parms += '&versionable=false' unless(versionable)
159
- DorService.set_datastream(druid, ds_id, parms, :post, {:type => content_type, :xml => xml})
160
- end
161
-
162
- # Retrieve the content of a datastream of a DOR object
163
- # e.g. FEDORA_URI + /objects/ + druid + /datastreams/dor/content gets "dor" datastream content
164
- def DorService.get_datastream(druid, ds_id)
165
- begin
166
- LyberCore::Log.debug("Connecting to #{FEDORA_URI}...")
167
- url_string = "#{FEDORA_URI}/objects/#{druid}/datastreams/#{ds_id}/content"
168
- url = URI.parse(url_string)
169
- LyberCore::Log.debug("Connecting to #{url_string}...")
170
- req = Net::HTTP::Get.new(url.request_uri)
171
- LyberCore::Log.debug("request object: #{req.inspect}")
172
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
173
- res = LyberCore::Connection.send_request(url,req)
174
-
175
- case res
176
- when Net::HTTPSuccess
177
- return res.body
178
- when Net::HTTPClientError
179
- LyberCore::Log.debug("Datastream not found at url #{url_string}")
180
- return nil
181
- when Net::HTTPServerError
182
- LyberCore::Log.debug("Attempted to reach #{url_string} but failed")
183
- raise "Encountered 500 error when requesting #{url_string}: #{res.inspect}"
184
- else
185
- LyberCore::Log.debug("Encountered unknown error when requesting #{url}: #{res.inspect}")
186
- raise "Encountered unknown error when requesting #{url}: #{res.inspect}"
187
- end
188
- rescue Exception => e
189
- raise e
190
- end
191
- end
192
-
193
- # Depricated - use Dor::WorkflowService#get_workflow_xml
194
- def DorService.get_workflow_xml(druid, workflow)
195
- raise Exception.new("This method is deprecated. Please use Dor::WorkflowService#get_workflow_xml")
196
- end
197
-
198
- # Retrieve the metadata of a datastream of a DOR object
199
- # e.g. FEDORA_URI + /objects/ + druid + /datastreams/dor gets "dor" datastream metadata
200
- def DorService.get_datastream_md(druid, ds_id)
201
- begin
202
- LyberCore::Log.debug("Connecting to #{FEDORA_URI}...")
203
- url = URI.parse(FEDORA_URI + '/objects/' + druid + '/datastreams/' + ds_id)
204
- LyberCore::Log.debug("Connecting to #{url}...")
205
- req = Net::HTTP::Get.new(url.request_uri)
206
- req.basic_auth FEDORA_USER, FEDORA_PASS
207
- LyberCore::Log.debug("request object: #{req.inspect}")
208
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
209
- res = LyberCore::Connection.send_request(url,req)
210
- case res
211
- when Net::HTTPSuccess
212
- return res.body
213
- else
214
- LyberCore::Log.error("Attempted to reach #{url} but failed")
215
- LyberCore::Log.error("Datastream #{dsid} not found for #{druid}")
216
- end
217
- rescue Exception => e
218
- raise e, "Couldn't get datastream from #{url}"
219
- end
220
- end
221
-
222
- # Add a new datastream, but only if it does not yet exist
223
- def DorService.add_datastream_unless_exists(druid, ds_id, ds_label, xml)
224
- # make sure xml is not empty
225
- unless xml
226
- raise "No data supplied for datastream " + ds + "of " + druid
227
- end
228
- # check to make sure datastream does not yet exist
229
- unless DorService.get_datastream(druid, ds_id)
230
- DorService.add_datastream(druid, ds_id, ds_label, xml)
231
- end
232
- end
233
-
234
- ############################################# End of Datastream methods
235
-
236
-
237
- # Deprecated. Use Dor::WorkflowService#update_workflow_status
238
- #PUT "objects/pid:123/workflows/GoogleScannedWF/convert"
239
- #<process name=\"convert\" status=\"waiting\" elapsed="0.11" lifecycle="released" "/>"
240
- #TODO increment attempts
241
- # def DorService.updateWorkflowStatus(repository, druid, workflow, process, status, elapsed = 0, lifecycle = nil)
242
- # begin
243
- # url = URI.parse(WORKFLOW_URI + '/' + repository + '/objects/' + druid + '/workflows/' + workflow + '/' + process)
244
- # req = Net::HTTP::Put.new(url.path)
245
- # process_xml = '<process name="'+ process + '" status="' + status + '" '
246
- # process_xml << 'elapsed="' + elapsed.to_s + '" '
247
- # process_xml << 'lifecycle="' + lifecycle + '" ' if(lifecycle)
248
- # process_xml << '/>'
249
- # req.body = process_xml
250
- # req.content_type = 'application/xml'
251
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
252
- # case res
253
- # when Net::HTTPSuccess
254
- # puts "#{workflow} process updated for " + druid
255
- # else
256
- # $stderr.print res.body
257
- # raise res.error!
258
- # end
259
- # rescue Exception => e
260
- # $stderr.print "Unable to update workflow " + e
261
- # raise
262
- # end
263
- #
264
- # end
265
-
266
- # Returns string containing object list XML from a workflow DOR query
267
- #
268
- # @param [String] repository name of the repository you are querying. Right now, <tt>dor</tt> and <tt>sdr</tt> are supported
269
- # @param [String] workflow name of the workflow being queried, eg <tt>googleScannedBookWF</tt>
270
- # @param [String, Array] completed if only querying for one completed step, pass in a String.
271
- # If querying for two completed steps, pass in an Array of the two completed steps
272
- # @param [String] waiting the name of the waiting step
273
- # @raise [LyberCore::Exceptions::EmptyQueue] When the query is successful, but no objects are found in that queue
274
- # @raise [Exception] For other problems like connection failures
275
- # @return [String] XML containing all the objects that match the specific query. It looks like:
276
- # <objects>
277
- # <object druid="dr:123" url="http://localhost:9999/jersey-spring/objects/dr:123%5c" />
278
- # <object druid="dr:abc" url="http://localhost:9999/jersey-spring/objects/dr:abc%5c" />
279
- # </objects>
280
- def DorService.get_objects_for_workstep(repository, workflow, completed, waiting)
281
- LyberCore::Log.debug("DorService.get_objects_for_workstep(#{repository}, #{workflow}, #{completed}, #{waiting})")
282
-
283
- if repository.nil? or workflow.nil? or completed.nil? or waiting.nil?
284
- LyberCore::Log.fatal("Can't execute DorService.get_objects_for_workstep: missing info")
285
- end
286
-
287
- unless defined?(WORKFLOW_URI) and WORKFLOW_URI != nil
288
- LyberCore::Log.fatal("WORKFLOW_URI is not set. ROBOT_ROOT = #{ROBOT_ROOT}")
289
- raise "WORKFLOW_URI is not set"
290
- end
291
-
292
- uri_string = "#{WORKFLOW_URI}/workflow_queue?repository=#{repository}&workflow=#{workflow}&waiting=#{waiting}"
293
- if(completed.class == Array)
294
- raise "The workflow service can only handle queries with no more than 2 completed steps" if completed.size > 2
295
- completed.each {|step| uri_string << "&completed=#{step}"}
296
- else
297
- uri_string << "&completed=#{completed}"
298
- end
299
-
300
- return DorService.execute_workflow_xml_query(uri_string)
301
- end
302
-
303
- # Returns string containing object list XML from a workflow DOR query using fully qualified workflow step names
304
- # eg <tt>dor:googleScannedBookWF:register-object</tt>
305
- #
306
- # @param [String, Array] completed if only querying for one completed step, pass in a String of a fully qualified workflow step.
307
- # If querying for two completed steps, pass in an Array of the two completed steps
308
- # @param [String] waiting the fully qualified name of the waiting step
309
- # @raise [LyberCore::Exceptions::EmptyQueue] When the query is successful, but no objects are found in that queue
310
- # @raise [Exception] For other problems like connection failures or passing in non-qualified workflow names
311
- # @return [String] XML containing all the objects that match the specific query. It looks like:
312
- # <objects>
313
- # <object druid="dr:123" url="http://localhost:9999/jersey-spring/objects/dr:123%5c" />
314
- # <object druid="dr:abc" url="http://localhost:9999/jersey-spring/objects/dr:abc%5c" />
315
- # </objects>
316
- def DorService.get_objects_for_qualified_workstep(completed, waiting)
317
- LyberCore::Log.debug("DorService.get_objects_for_qualified_workstep(#{completed}, #{waiting})")
318
-
319
- if completed.nil? or waiting.nil?
320
- LyberCore::Log.fatal("Can't execute DorService.get_objects_for_qualified_workstep: missing info")
321
- end
322
-
323
- unless defined?(WORKFLOW_URI) and WORKFLOW_URI != nil
324
- LyberCore::Log.fatal("WORKFLOW_URI is not set. ROBOT_ROOT = #{ROBOT_ROOT}")
325
- raise "WORKFLOW_URI is not set"
326
- end
327
-
328
- unless(waiting =~ /.+:.+:.+/)
329
- raise "The waiting step was not fully qualified or of the form: <repository>:<workflow>:<stepname>. Received #{waiting}"
330
- end
331
- uri_string = "#{WORKFLOW_URI}/workflow_queue?waiting=#{waiting}"
332
-
333
- completed_steps = Array(completed)
334
- raise "The workflow service can only handle queries with no more than 2 completed steps" if completed_steps.size > 2
335
- completed_steps.each do |step|
336
- raise "A completed step was not fully qualified or of the form: <repository>:<workflow>:<stepname>. Received #{step}" unless(step =~ /.+:.+:.+/)
337
- uri_string << "&completed=#{step}"
338
- end
339
-
340
- return DorService.execute_workflow_xml_query(uri_string)
341
- end
342
-
343
-
344
-
345
- def DorService.log_and_raise_workflow_connection_problem(repository, workflow, completed, waiting, response)
346
-
347
- end
348
-
349
- # Transforms the XML from getObjectsForWorkStep into a list of druids
350
- # TODO figure out how to return a partial list
351
- # This method is here for backward compatibility, but it has
352
- # been superceded by DlssService.get_druids_from_object_list(objectListXml)
353
- def DorService.get_druids_from_object_list(objectListXml)
354
- DlssService.get_all_druids_from_object_list(objectListXml)
355
- end
356
-
357
- # Retrieves the identityMetadata datastream for a DOR object,
358
- # extracts the otherId values, and returns them in a hash
359
- def DorService.get_object_identifiers(druid)
360
- begin
361
- identifiers = {}
362
- identityMetadata = get_datastream(druid, 'identityMetadata')
363
- raise "Unable to get identityMetadata datastream for #{druid}" if identityMetadata.nil?
364
- dorXml = Document.new(identityMetadata)
365
-
366
- dorXml.elements.each("identityMetadata/otherId") do |element|
367
- identifiers[element.attributes["name"]] = case element.text
368
- when nil then nil
369
- else element.text.strip
370
- end
371
- end
372
- return identifiers
373
- rescue Exception => e
374
- raise e, "Couldn't get object identifiers for #{druid}"
375
- end
376
- end
377
-
378
- def DorService.transfer_object(objectid, sourceDir, destinationDir)
379
- rsync='rsync -a -e ssh '
380
- rsync_cmd = rsync + "'" + sourceDir + objectid + "' " + destinationDir
381
- LyberCore::Log.debug(rsync_cmd + "\n")
382
- system(rsync_cmd)
383
- return File.exists?(File.join(destinationDir, objectid))
384
- end
385
-
386
- def DorService.verify_checksums(directory, checksumFile)
387
- dirSave = Dir.pwd
388
- Dir.chdir(directory)
389
- checksumCmd = 'md5sum -c ' + checksumFile + ' | grep -v OK | wc -l'
390
- badcount = `#{checksumCmd}`.to_i
391
- Dir.chdir(dirSave)
392
- return (badcount==0)
393
- end
394
-
395
- # Given a process and an error message, constuct an xml fragment that can be
396
- # posted to the workflow service to record the error generated for a given druid
397
- def DorService.construct_error_update_request(process, error_msg, error_txt)
398
- clean_error_msg = error_msg.gsub(/\s+/," ").gsub(/[`'#<>]/,'').gsub(/"/,"'")
399
- clean_error_txt = error_txt.gsub(/\s+/," ").gsub(/[`'#<>]/,'').gsub(/"/,"'") unless error_txt.nil?
400
- body = '<process name="'+ process + '" status="error" errorMessage="' + clean_error_msg + '" '
401
- body += 'errorText="' + clean_error_txt + '" ' unless error_txt.nil?
402
- body += '/>'
403
- return body
404
- end
405
-
406
- # If an object encounters an error during processing, set its status to "error"
407
- def DorService.update_workflow_error_status(repository, druid, workflow, process, error_msg, error_txt = nil)
408
- begin
409
- LyberCore::Log.debug("Updating workflow error status for druid #{druid}")
410
- LyberCore::Log.debug("Error message is: #{error_msg}")
411
- LyberCore::Log.debug("Error text is: #{error_txt}")
412
- url_string = "#{WORKFLOW_URI}/#{repository}/objects/#{druid}/workflows/#{workflow}/#{process}"
413
- url = URI.parse(url_string)
414
- LyberCore::Log.debug("Using url #{url_string}")
415
- req = Net::HTTP::Put.new(url.path)
416
- req.body = DorService.construct_error_update_request(process, error_msg, error_txt)
417
- req.content_type = 'application/xml'
418
- LyberCore::Log::debug("Putting request: #{req.inspect}")
419
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
420
- res = LyberCore::Connection.send_request(url,req)
421
- LyberCore::Log::debug("Got response: #{res.inspect}")
422
- case res
423
- when Net::HTTPSuccess
424
- LyberCore::Log.info("#{workflow} - #{process} set to error for " + druid)
425
- else
426
- LyberCore::Log.debug(res.body)
427
- raise res.error!, "Received error from the workflow service"
428
- end
429
- rescue Exception => e
430
- msg = "Unable to update workflow service at url #{url_string}"
431
- LyberCore::Log.debug(msg)
432
- raise e, msg
433
- end
434
- end
435
-
436
- # This method sends a GET request to jenson and returns MARC XML
437
- def DorService.query_symphony(flexkey)
438
- begin
439
- symphony_url = 'http://zaph.stanford.edu'
440
- path_info = '/cgi-bin/holding.pl?'
441
- parm_list = URI.escape('search=location&flexkey=' + flexkey)
442
- url_string = symphony_url + path_info + parm_list
443
-
444
- url = URI.parse(url_string)
445
- LyberCore::Log.debug("Attempting to query symphony: #{url_string}")
446
- res = Net::HTTP.start(url.host, url.port) {|http|
447
- http.get( path_info + parm_list )
448
- }
449
- case res
450
- when Net::HTTPSuccess
451
- LyberCore::Log.debug("Successfully queried symphony for #{flexkey}")
452
- return res.body
453
- else
454
- LyberCore::Log.debug("Encountered an error from symphony: #{res.body}")
455
- raise res.error!
456
- end
457
- rescue Exception => e
458
- raise e, "Encountered an error from symphony"
459
- end
460
-
461
- end #query_symphony
462
-
463
-
464
- private
465
-
466
- def DorService.execute_workflow_xml_query(uri_string)
467
- LyberCore::Log.info("Attempting to connect to #{uri_string}")
468
- url = URI.parse(uri_string)
469
- req = Net::HTTP::Get.new(url.request_uri)
470
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
471
- res = LyberCore::Connection.send_request(url,req)
472
- case res
473
- when Net::HTTPSuccess
474
- begin
475
- doc = Nokogiri::XML(res.body)
476
- count = doc.root.at_xpath("//objects/@count").content.to_i
477
- rescue Exception => e
478
- msg = "Could not parse response from Workflow Service"
479
- LyberCore::Log.debug(msg + "\n#{res.body}")
480
- raise e, msg
481
- end
482
-
483
- if(count == 0)
484
- raise LyberCore::Exceptions::EmptyQueue.new, "empty queue"
485
- else
486
- return res.body
487
- end
488
- else
489
- LyberCore::Log.fatal("Workflow queue not found for #{workflow} : #{waiting}")
490
- LyberCore::Log.debug("I am attempting to connect to WORKFLOW_URI #{WORKFLOW_URI}")
491
- LyberCore::Log.debug("repository: #{repository}")
492
- LyberCore::Log.debug("workflow: #{workflow}")
493
- LyberCore::Log.debug("completed: #{completed}")
494
- LyberCore::Log.debug("waiting: #{waiting}")
495
- LyberCore::Log.debug(res.inspect)
496
- raise "Could not connect to url #{uri_string}"
497
- end
498
- end
499
- # druid, ds, url, content_type, method, parms
500
- def DorService.set_datastream(druid, ds_id, parms, method, content = {})
501
- begin
502
- url = URI.parse(FEDORA_URI + '/objects/' + druid + '/datastreams/' + ds_id + parms)
503
- case method
504
- when :post
505
- req = Net::HTTP::Post.new(url.request_uri)
506
- when :put
507
- req = Net::HTTP::Put.new(url.request_uri)
508
- end
509
- req.basic_auth FEDORA_USER, FEDORA_PASS
510
- req.body = content[:xml] if(content[:xml])
511
- req.content_type = content[:type]
512
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
513
- res = LyberCore::Connection.send_request(url,req)
514
- case res
515
- when Net::HTTPSuccess
516
- return true
517
- when Net::HTTPServerError
518
- LyberCore::Log.debug("Attempted to set datastream #{url} but failed")
519
- raise "Encountered 500 error setting datastream #{url}: #{res.inspect}"
520
- else
521
- LyberCore::Log.debug("Encountered unknown error when setting datastream #{url}: #{res.inspect}")
522
- raise "Encountered unknown error when setting datastream #{url}: #{res.inspect}"
523
- end
524
- rescue Exception => e
525
- raise
526
- end
527
- end
528
-
529
- def DorService.get_object_metadata(druid)
530
- dor = DorService.get_datastream(druid, 'identityMetadata')
531
- mods = DorService.get_datastream(druid, 'descMetadata')
532
- googlemets = DorService.get_datastream(druid, 'googlemets')
533
- contentMetadata = DorService.get_datastream(druid, 'contentMetadata')
534
- adminMetadata = DorService.get_datastream(druid, 'adminMetadata')
535
- xml = "<objectMD druid='" + druid + "' >\n" +
536
- dor + mods + googlemets + contentMetadata + adminMetadata +
537
- "</objectMD>\n"
538
- return xml
539
- end
540
-
541
- end
542
-
543
- # Given an array of strings, construct valid xml in which each
544
- # member of the array becomes a <tag> element
545
- def DorService.construct_xml_for_tag_array(tag_array)
546
- xml = "<tags>"
547
- tag_array.each do |tag|
548
- tag = tag.gsub(/\s+/," ").gsub(/[<>!]/,'')
549
- xml << "<tag>#{tag}</tag>"
550
- end
551
- xml << "</tags>"
552
- end
553
-
554
-
555
- def DorService.add_identity_tags(druid, tags)
556
- begin
557
- url = URI.parse(DOR_URI + '/objects/' + druid + '/datastreams/identityMetadata/tags' )
558
- req = Net::HTTP::Put.new(url.path)
559
- req.body = DorService.construct_xml_for_tag_array(tags)
560
- req.content_type = 'application/xml'
561
- # res = DorService.get_https_connection(url).start {|http| http.request(req) }
562
- res = LyberCore::Connection.send_request(url,req)
563
- case res
564
- when Net::HTTPSuccess
565
- return true
566
- when Net::HTTPServerError
567
- LyberCore::Log.debug("Attempted to add identity tags #{url} but failed")
568
- raise "Encountered 500 error when adding identity tags #{url}: #{res.inspect}"
569
- else
570
- LyberCore::Log.debug("Encountered unknown error when adding identity tags #{url}: #{res.inspect}")
571
- raise "Encountered unknown error when adding identity tags #{url}: #{res.inspect}"
572
- end
573
- rescue Exception => e
574
- raise e
575
- end
576
- end
577
-
578
- #DorService.updateWorkflowStatus('dr:rf624mb644', 'GoogleScannedWF', 'descriptive-metadata', 'completed')
579
- ####Testing
580
- #line = 'id="catkey:1990757"||id="barcode:36105045033136"||model="GoogleScannedBook"||label="The poacher"'
581
- #form_data = {}
582
- #DorService.parse_line_return_hashlist(line, form_data)
583
- #form_data.each_pair{|k,v| puts "key: #{k} value: #{v}"}
584
- #
585
- #puts DorService.encodeParams(form_data)
586
-
587
- #DorService.create_object('id="catkey:454545454545454"||id="barcode:434343434343434343434343434"||model="GoogleScannedBook"||label="Ruby multiple Id parms 3"')
588
-