openurl 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -38,22 +38,32 @@ response format, so parsing the returned value is up to you.
38
38
 
39
39
  ## Ruby 1.9 and encodings
40
40
 
41
- Gem does run and all tests pass under ruby 1.9. There is very limited
42
- support for handling character encodings in the proper 1.9 way.
41
+ Gem is currently developed under 1.9.3, although it should work under 1.8.7.
42
+ There is some limited support for handling character encodings in the
43
+ proper ruby 1.9 way.
43
44
 
44
- CTX or XML context objects will be assumed utf-8 even if the ruby string
45
- they are held in has an ascii-8bit encoding. They will forced into a utf-8 encoding.
46
- This seems to be a side effect of the REXML and CGI libraries we use to parse,
47
- but there are runnable tests that assert it is true. (see test/encoding_test.rb)
45
+ ### load_from_kev, load_from_form_vars
48
46
 
49
- Incoming context objects with a non-utf8 ctx_enc value will *not* be handled
50
- properly, they'll still be forced to utf8.
47
+ When using ContextObject#load_from_kev or #load_from_form_vars, input
48
+ will be assumed to be UTF8, unless a ctx_enc value is present specifying
49
+ ISO-8859-1. The actual ruby #encoding of the input string/stream is ignored,
50
+ data will be force_encoded on read. If input is specified ISO-8859-1 with
51
+ ctx_enc data _will_ be _transcoded_ to UTF8 on read.
52
+
53
+ Any illegal bytes for the input character encoding _will_ be replaced by
54
+ the unicode replacement symbol ("\uFFFD") on read.
55
+
56
+ ### load_from_xml
57
+
58
+ Input will be assumed UTF8, and force_encoded to UTF8. Illegal bytes in input
59
+ for UTF8 will be replaced by unicode replacement char ("\uFFFD").
60
+
61
+ ### Programmatic creation of context objects
51
62
 
52
63
  Programmatically created context objects, you must ensure all strings are
53
- represented as utf8 encoded yourself.
64
+ represented and tagged as utf8 encoded yourself, no detection or trascoding
65
+ or correction will be done for you.
54
66
 
55
- More sophisticated encoding handling can theoretically be added, but it's
56
- somewhat non-trivial, and it's not clear anyone needs it.
57
67
 
58
68
  ## INSTALLATION
59
69
 
@@ -0,0 +1,663 @@
1
+ # encoding: UTF-8
2
+
3
+ require 'ensure_valid_encoding'
4
+
5
+ module OpenURL
6
+
7
+ if RUBY_VERSION < '1.9'
8
+ require 'jcode'
9
+ $KCODE='UTF-8'
10
+ end
11
+
12
+ ##
13
+ # The ContextObject class is intended to both create new OpenURL 1.0 context
14
+ # objects or parse existing ones, either from Key-Encoded Values (KEVs) or
15
+ # XML.
16
+ # == Create a new ContextObject programmatically
17
+ # require 'openurl/context_object'
18
+ # include OpenURL
19
+ #
20
+ # ctx = ContextObject.new
21
+ # ctx.referent.set_format('journal') # important to do this FIRST.
22
+ #
23
+ # ctx.referent.add_identifier('info:doi/10.1016/j.ipm.2005.03.024')
24
+ # ctx.referent.set_metadata('issn', '0306-4573')
25
+ # ctx.referent.set_metadata('aulast', 'Bollen')
26
+ # ctx.referrer.add_identifier('info:sid/google')
27
+ # puts ctx.kev
28
+ # # url_ver=Z39.88-2004&ctx_tim=2007-10-29T12%3A18%3A53-0400&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_id=&rft.issn=0306-4573&rft.aulast=Bollen&rft_val_fmt=info%3Aofi%2Ffmt%3Axml%3Axsd%3Ajournal&rft_id=info%3Adoi%2F10.1016%2Fj.ipm.2005.03.024&rfr_id=info%3Asid%2Fgoogle
29
+ #
30
+ # == Create a new ContextObject from an existing kev or XML serialization:
31
+ #
32
+ # ContextObject.new_from_kev( kev_context_object )
33
+ # ContextObject.new_from_xml( xml_context_object ) # Can be String or REXML::Document
34
+ #
35
+ # == Serialize a ContextObject to kev or XML :
36
+ # ctx.kev
37
+ # ctx.xml
38
+ class ContextObject
39
+ include EnsureValidEncoding
40
+
41
+ attr_reader :admin, :referent, :referringEntity, :requestor, :referrer,
42
+ :serviceType, :resolver
43
+ attr_accessor :foreign_keys, :openurl_ver
44
+
45
+ @@defined_entities = {"rft"=>"referent", "rfr"=>"referrer", "rfe"=>"referring-entity", "req"=>"requestor", "svc"=>"service-type", "res"=>"resolver"}
46
+
47
+ # Creates a new ContextObject object and initializes the ContextObjectEntities.
48
+
49
+ def initialize()
50
+ @referent = ContextObjectEntity.new
51
+ @referrer = ContextObjectEntity.new
52
+ @referringEntity = ContextObjectEntity.new
53
+ @requestor = ContextObjectEntity.new
54
+ @serviceType = []
55
+ @resolver = []
56
+ @foreign_keys = {}
57
+ @openurl_ver = "Z39.88-2004"
58
+ @admin = {"ctx_ver"=>{"label"=>"version", "value"=>@openurl_ver}, "ctx_tim"=>{"label"=>"timestamp", "value"=>DateTime.now().to_s}, "ctx_id"=>{"label"=>"identifier", "value"=>""}, "ctx_enc"=>{"label"=>"encoding", "value"=>"info:ofi/enc:UTF-8"}}
59
+ end
60
+
61
+ # Any legal OpenURL 1.0 sends url_ver=Z39.88-2004, and usually
62
+ # ctx_ver=Z39.88-2004 too. However, sometimes we need to send
63
+ # an illegal OpenURL with a different openurl ver string, to deal
64
+ # with weird agents, for instance to trick SFX into doing the right thing.
65
+ def openurl_ver=(ver)
66
+ @openurl_ver = ver
67
+ @admin["ctx_ver"]["value"] = ver
68
+ end
69
+
70
+ def deep_copy
71
+ cloned = ContextObject.new
72
+ cloned.import_context_object( self )
73
+ return cloned
74
+ end
75
+
76
+ # Serialize the ContextObject to XML.
77
+
78
+ def xml
79
+ doc = REXML::Document.new()
80
+ coContainer = doc.add_element "ctx:context-objects"
81
+ coContainer.add_namespace("ctx","info:ofi/fmt:xml:xsd:ctx")
82
+ coContainer.add_namespace("xsi", "http://www.w3.org/2001/XMLSchema-instance")
83
+ coContainer.add_attribute("xsi:schemaLocation", "info:ofi/fmt:xml:xsd:ctx http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx")
84
+ co = coContainer.add_element "ctx:context-object"
85
+ @admin.each_key do |k|
86
+ next if k == "ctx_enc"
87
+ co.add_attribute(@admin[k]["label"], @admin[k]["value"])
88
+ end
89
+
90
+ [{@referent=>"rft"},
91
+ {@referringEntity=>"rfe"}, {@requestor=>"req"},
92
+ {@referrer=>"rfr"}].each do | entity |
93
+
94
+ entity.each do | ent, label |
95
+ ent.xml(co, label) unless ent.empty?
96
+ end
97
+ end
98
+
99
+ [{@serviceType=>"svc"}, {@resolver=>"res"}].each do |entity|
100
+ entity.each do | entCont, label |
101
+ entCont.each do |ent|
102
+ ent.xml(co, label) unless ent.empty?
103
+ end
104
+ end
105
+ end
106
+
107
+ return doc.to_s
108
+ end
109
+
110
+
111
+ # Output the ContextObject as a Key-encoded value string. Pass a boolean
112
+ # true argument if you do not want the ctx_tim key included.
113
+
114
+ def kev(no_date=false)
115
+ kevs = ["url_ver=#{self.openurl_ver}", "url_ctx_fmt=#{CGI.escape("info:ofi/fmt:kev:mtx:ctx")}"]
116
+
117
+ # Loop through the administrative metadata
118
+ @admin.each_key do |k|
119
+ next if k == "ctx_tim" && no_date
120
+ kevs.push(k+"="+CGI.escape(@admin[k]["value"].to_s)) if @admin[k]["value"]
121
+ end
122
+
123
+ {@referent=>"rft", @referringEntity=>"rfe", @requestor=>"req", @referrer=>"rfr"}.each do | ent, abbr |
124
+ kevs.push(ent.kev(abbr)) unless ent.empty?
125
+ end
126
+
127
+ {@serviceType=>"svc", @resolver=>"res"}.each do |entCont, abbr|
128
+ entCont.each do |ent|
129
+ next if ent.empty?
130
+ kevs.push(ent.kev(abbr))
131
+ end
132
+ end
133
+ return kevs.join("&")
134
+ end
135
+
136
+ # Outputs the ContextObject as a ruby hash---hash version of the kev format.
137
+ # Outputting a context object as a hash
138
+ # is imperfect, because context objects can have multiple elements
139
+ # with the same key--and because some keys depend on SAP1 vs SAP2.
140
+ # So this function is really deprecated, but here because we have so much
141
+ # code dependent on it.
142
+ def to_hash
143
+ co_hash = {"url_ver"=>self.openurl_ver, "url_ctx_fmt"=>"info:ofi/fmt:kev:mtx:ctx"}
144
+
145
+ @admin.each_key do |k|
146
+ co_hash[k]=@admin[k]["value"] if @admin[k]["value"]
147
+ end
148
+
149
+ {@referent=>"rft", @referringEntity=>"rfe", @requestor=>"req", @referrer=>"rfr"}.each do | ent, abbr |
150
+ co_hash.merge!(ent.to_hash(abbr)) unless ent.empty?
151
+ end
152
+
153
+ # svc and res are arrays of ContextObjectEntity
154
+ {@serviceType=>"svc", @resolver=>"res"}.each do |ent_list, abbr|
155
+ ent_list.each do |ent|
156
+ co_hash.merge!(ent.to_hash(abbr)) unless ent.empty?
157
+ end
158
+ end
159
+ return co_hash
160
+ end
161
+
162
+
163
+ # Outputs a COinS (ContextObject in SPANS) span tag for the ContextObject.
164
+ # Arguments are any other CSS classes you want included and the innerHTML
165
+ # content.
166
+
167
+ def coins (classnames=nil, innerHTML=nil)
168
+ return "<span class='Z3988 #{classnames}' title='"+CGI.escapeHTML(self.kev(true))+"'>#{innerHTML}</span>"
169
+ end
170
+
171
+
172
+ # Sets a ContextObject administration field.
173
+
174
+ def set_administration_key(key, val)
175
+ raise ArgumentException, "#{key} is not a valid admin key!" unless @admin.keys.index(key)
176
+ @admin[key]["value"] = val
177
+ end
178
+
179
+ # Imports an existing Key-encoded value string and sets the appropriate
180
+ # entities.
181
+
182
+ def import_kev(kev)
183
+ co = CGI::parse(kev)
184
+ co2 = {}
185
+ co.each do |key, val|
186
+ if val.is_a?(Array)
187
+ if val.length == 1
188
+ co2[key] = val[0]
189
+ else
190
+ co2[key] = val
191
+ end
192
+ end
193
+ end
194
+ self.import_hash(co2)
195
+ end
196
+
197
+ # Initialize a new ContextObject object from an existing KEV
198
+
199
+ def self.new_from_kev(kev)
200
+ co = self.new
201
+ co.import_kev(kev)
202
+ return co
203
+ end
204
+
205
+ # Initialize a new ContextObject object from a CGI.params style hash
206
+ # Expects a hash with default value being nil though, not [] as CGI.params
207
+ # actually returns, beware. Can also accept a Rails-style params hash
208
+ # (single string values, not array values), although this may lose
209
+ # some context object information.
210
+ def self.new_from_form_vars(params)
211
+ co = self.new
212
+ if ctx_val = (params[:url_ctx_val]||params["url_ctx_val"]) and not ctx_val.empty? # this is where the context object stuff will be
213
+ co.admin.keys.each do | adm |
214
+ if params[adm.to_s]
215
+ if params[adm.to_s].is_a?(Array)
216
+ co.set_administration_key(adm, params[adm.to_s].first)
217
+ else
218
+ co.set_administration_key(adm, params[adm.to_s])
219
+ end
220
+ end
221
+ end
222
+
223
+ if ctx_format = (params["url_ctx_fmt"]||params[:url_ctx_fmt])
224
+ ctx_format = ctx_format.first if ctx_format.is_a?(Array)
225
+ ctx_val = ctx_val.first if ctx_val.is_a?(Array)
226
+ if ctx_format == "info:ofi/fmt:xml:xsd:ctx"
227
+ co.import_xml(ctx_val)
228
+ elsif ctx_format == "info:ofi/fmt:kev:mtx:ctx"
229
+ co.import_kev(ctx_val)
230
+ end
231
+ end
232
+ else # we'll assume this is standard inline kev
233
+ co.import_hash(params)
234
+ end
235
+ return co
236
+ end
237
+
238
+ # Imports an existing XML encoded context object and sets the appropriate
239
+ # entities
240
+
241
+ def import_xml(xml)
242
+ if xml.is_a?(String)
243
+ xml.force_encoding("UTF-8")
244
+ ensure_valid_encoding!(xml, :invalid => :replace)
245
+ doc = REXML::Document.new xml.gsub(/>[\s\t]*\n*[\s\t]*</, '><').strip
246
+ elsif xml.is_a?(REXML::Document)
247
+ doc = xml
248
+ else
249
+ raise ArgumentError, "Argument must be an REXML::Document or well-formed XML string"
250
+ end
251
+
252
+ # Cut to the context object
253
+ ctx = REXML::XPath.first(doc, ".//ctx:context-object", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
254
+
255
+
256
+
257
+
258
+
259
+
260
+ ctx.attributes.each do |attr, val|
261
+ @admin.each do |adm, vals|
262
+ self.set_administration_key(adm, val) if vals["label"] == attr
263
+ end
264
+ end
265
+ ctx.to_a.each do | ent |
266
+ if @@defined_entities.value?(ent.name())
267
+ self.import_entity(ent)
268
+ else
269
+ self.import_custom_node(ent)
270
+ end
271
+ end
272
+ end
273
+
274
+ # Initialize a new ContextObject object from an existing XML ContextObject
275
+
276
+ def self.new_from_xml(xml)
277
+ co = self.new
278
+ co.import_xml(xml)
279
+ return co
280
+ end
281
+
282
+
283
+ # Takes a hash of openurl key/values, as output by CGI.parse
284
+ # from a query string for example (values can be strings or arrays
285
+ # of string). Mutates hash in place.
286
+ #
287
+ # Force encodes to UTF8 or 8859-1, depending on ctx_enc
288
+ # presence and value.
289
+ #
290
+ # Replaces any illegal bytes with replacement chars,
291
+ # transcodes to UTF-8 if needed to ensure UTF8 on way out.
292
+ def clean_char_encoding!(hash)
293
+ source_encoding = if hash["ctx_ver"] == "info:ofi/enc:ISO-8859-1"
294
+ "ISO-8859-1"
295
+ else
296
+ "UTF-8"
297
+ end
298
+
299
+ hash.each_pair do |key, values|
300
+ # get a list of all terminal values, whether wrapped
301
+ # in arrays or not. We're going to mutate them.
302
+ [values].flatten.each do | v |
303
+ v.force_encoding(source_encoding)
304
+ if source_encoding == "UTF-8"
305
+ ensure_valid_encoding!(v, :invalid => :replace)
306
+ else
307
+ # transcode, replacing any bad chars.
308
+ v.encode!("UTF-8", :invalid => :replace)
309
+ end
310
+ end
311
+ end
312
+
313
+ end
314
+
315
+ # Imports an existing hash of ContextObject values and sets the appropriate
316
+ # entities.
317
+ def import_hash(hash)
318
+ clean_char_encoding!(hash)
319
+
320
+ ref = {}
321
+ {"@referent"=>"rft", "@referrer"=>"rfr", "@referringEntity"=>"rfe",
322
+ "@requestor"=>"req"}.each do | ent, abbr |
323
+ next unless hash["#{abbr}_val_fmt"]
324
+ val = hash["#{abbr}_val_fmt"]
325
+ val = val[0] if val.is_a?(Array)
326
+ self.instance_variable_set(ent.to_sym, ContextObjectEntityFactory.format(val))
327
+ end
328
+
329
+ {"@serviceType"=>"svc","@resolver"=>"res"}.each do | ent, abbr |
330
+ next unless hash["#{abbr}_val_fmt"]
331
+ val = hash["#{abbr}_val_fmt"]
332
+ val = val[0] if val.is_a?(Array)
333
+ self.instance_variable_set(ent.to_sym, [ContextObjectEntityFactory.format(val)])
334
+ end
335
+
336
+ openurl_keys = ["url_ver", "url_tim", "url_ctx_fmt"]
337
+ hash.each do |key, value|
338
+ val = value
339
+ val = value[0] if value.is_a?(Array)
340
+
341
+ next if value.nil? || value.empty?
342
+
343
+ if openurl_keys.include?(key)
344
+ next # None of these matter much for our purposes
345
+ elsif @admin.has_key?(key)
346
+ self.set_administration_key(key, val)
347
+ elsif key.match(/^[a-z]{3}_val_fmt/)
348
+ next
349
+ elsif key.match(/^[a-z]{3}_ref/)
350
+ # determines if we have a by-reference context object
351
+ (entity, v, fmt) = key.split("_")
352
+ ent = self.translate_abbr(entity)
353
+ unless ent
354
+ self.foreign_keys[key] = val
355
+ next
356
+ end
357
+ # by-reference requires two fields, format and location, if this is
358
+ # the first field we've run across, set a place holder until we get
359
+ # the other value
360
+ unless ref[entity]
361
+ if fmt
362
+ ref_key = "format"
363
+ else
364
+ ref_key = "location"
365
+ end
366
+ ref[entity] = [ref_key, val]
367
+ else
368
+ if ref[entity][0] == "format"
369
+ eval("@"+ent).set_reference(val, ref[entity][1])
370
+ else
371
+ eval("@"+ent).set_reference(ref[entity][1], val)
372
+ end
373
+ end
374
+ elsif key.match(/^[a-z]{3}_id$/)
375
+ # Get the entity identifier
376
+ (entity, v) = key.split("_")
377
+ ent = self.translate_abbr(entity)
378
+ unless ent
379
+ self.foreign_keys[key] = val
380
+ next
381
+ end
382
+ # May or may not be an array, turn it into one.
383
+ [value].flatten.each do | id |
384
+ eval("@"+ent).add_identifier(id)
385
+ end
386
+
387
+ elsif key.match(/^[a-z]{3}_dat$/)
388
+ # Get any private data
389
+ (entity, v) = key.split("_")
390
+ ent = self.translate_abbr(entity)
391
+ unless ent
392
+ self.foreign_keys[key] = val
393
+ next
394
+ end
395
+ eval("@"+ent).set_private_data(val)
396
+ else
397
+ # collect the entity metadata
398
+ keyparts = key.split(".")
399
+ if keyparts.length > 1
400
+ # This is 1.0 OpenURL
401
+ ent = self.translate_abbr(keyparts[0])
402
+ unless ent
403
+ self.foreign_keys[key] = val
404
+ next
405
+ end
406
+ eval("@"+ent).set_metadata(keyparts[1], val)
407
+ else
408
+ # This is a 0.1 OpenURL. Your mileage may vary on how accurately
409
+ # this maps.
410
+ if key == 'id'
411
+ if value.is_a?(Array)
412
+ value.each do | id |
413
+ @referent.add_identifier(id)
414
+ end
415
+ else
416
+ @referent.add_identifier(val)
417
+ end
418
+ elsif key == 'sid'
419
+ @referrer.set_identifier("info:sid/"+val.to_s)
420
+ elsif key == 'pid'
421
+ @referent.set_private_data(val.to_s)
422
+ else
423
+ @referent.set_metadata(key, val)
424
+ end
425
+ end
426
+ end
427
+ end
428
+
429
+
430
+
431
+ # Initialize a new ContextObject object from an existing key/value hash
432
+ co = self.new
433
+ co.import_hash(hash)
434
+ return co
435
+ end
436
+
437
+ # if we don't have a referent format (most likely because we have a 0.1
438
+ # OpenURL), try to determine something from the genre. If that doesn't
439
+ # exist, just call it a journal since most 0.1 OpenURLs would be one,
440
+ # anyway.
441
+ unless @referent.format
442
+ fmt = case @referent.metadata['genre']
443
+ when /article|journal|issue|proceeding|conference|preprint/ then 'journal'
444
+ when /book|bookitem|report|document/ then 'book'
445
+ else 'journal'
446
+ end
447
+ @referent.set_format(fmt)
448
+ end
449
+ end
450
+
451
+ # Translates the abbreviated entity (rft, rfr, etc.) to the associated class
452
+ # name. For repeatable entities, uses the first object in the array. Returns
453
+ # a string of the object name which would then be eval'ed to call a method
454
+ # upon.
455
+
456
+ def translate_abbr(abbr)
457
+ if @@defined_entities.has_key?(abbr)
458
+ ent = @@defined_entities[abbr]
459
+ if ent == "service-type"
460
+ ent = "serviceType[0]"
461
+ elsif ent == "resolver"
462
+ ent = "resolver[0]"
463
+ elsif ent == "referring-entity"
464
+ ent = "referringEntity"
465
+ end
466
+ else
467
+ return nil
468
+ end
469
+ return ent
470
+ end
471
+
472
+ def self.entities(term)
473
+ return @@defined_entities[term] if @@defined_entities.keys.index(term)
474
+ return @@defined_entities[@@defined_entities.values.index(term)] if @@defined_entities.values.index(term)
475
+ return nil
476
+
477
+ end
478
+
479
+ # Imports an existing OpenURL::ContextObject object and sets the appropriate
480
+ # entity values.
481
+
482
+ def import_context_object(context_object)
483
+ @admin.each_key { |k|
484
+ self.set_administration_key(k, context_object.admin[k]["value"])
485
+ }
486
+ ["@referent", "@referringEntity", "@requestor", "@referrer"].each do | ent |
487
+ self.instance_variable_set(ent.to_sym, Marshal::load(Marshal.dump(context_object.instance_variable_get(ent.to_sym))))
488
+ end
489
+ context_object.serviceType.each { |svc|
490
+ @serviceType << Marshal::load(Marshal.dump(svc))
491
+ }
492
+ context_object.resolver.each { |res|
493
+ @resolver << Marshal::load(Marshal.dump(res))
494
+ }
495
+ context_object.foreign_keys.each do | key, val |
496
+ self.foreign_keys[key] = val
497
+ end
498
+ end
499
+
500
+ # Initialize a new ContextObject object from an existing
501
+ # OpenURL::ContextObject
502
+
503
+ def self.new_from_context_object(context_object)
504
+ co = self.new
505
+ co.import_context_object(context_object)
506
+ return co
507
+ end
508
+
509
+ def referent=(entity)
510
+ raise ArgumentError, "Referent must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
511
+ @referent=entity
512
+ end
513
+
514
+ def referrer=(entity)
515
+ raise ArgumentError, "Referrer must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
516
+ @referrer=entity
517
+ end
518
+
519
+ def referringEntity=(entity)
520
+ raise ArgumentError, "Referring-Entity must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
521
+ @referringEntity=entity
522
+ end
523
+
524
+ def requestor=(entity)
525
+ raise ArgumentError, "Requestor must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
526
+ @requestor=entity
527
+ end
528
+
529
+ protected
530
+
531
+ def import_entity(node)
532
+ entities = {"rft"=>:@referent, "rfr"=>:@referrer, "rfe"=>:@referringEntity,"req"=>:@requestor,
533
+ "svc"=>:@serviceType,"res"=>:@resolver}
534
+
535
+ ent = @@defined_entities.keys[@@defined_entities.values.index(node.name())]
536
+
537
+
538
+ metalib_workaround(node)
539
+
540
+ unless ["svc","res"].index(ent)
541
+ self.instance_variable_set(entities[ent], self.set_typed_entity(node))
542
+ entity = self.instance_variable_get(entities[ent])
543
+
544
+
545
+
546
+ self.import_xml_common(entity, node)
547
+ entity.import_xml_metadata(node)
548
+ end
549
+ end
550
+
551
+ def import_svc_node(node)
552
+ if @serviceType[0].empty?
553
+ key = 0
554
+ else
555
+ key = self.add_service_type_entity
556
+ end
557
+ self.import_xml_common(@serviceType[key], node)
558
+ self.import_xml_mbv(@serviceType[key], node)
559
+ end
560
+
561
+ def import_res_node(node)
562
+ if @resolver[0].empty?
563
+ key = 0
564
+ else
565
+ key = self.add_resolver_entity
566
+ end
567
+ self.import_xml_common(@resolver[key], node)
568
+ self.import_xml_mbv(@resolver[key], node)
569
+ end
570
+
571
+ # Determines the proper subclass of ContextObjectEntity to use
572
+ # for given format. Input is an REXML node representing a ctx:referent.
573
+ # Returns ContextObjectEntity.
574
+ def set_typed_entity(node)
575
+ fmt = REXML::XPath.first(node, "./ctx:metadata-by-val/ctx:format", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
576
+
577
+ fmt_val = fmt.get_text.value if fmt && fmt.has_text?
578
+
579
+ # Special weird workaround for info sent from metalib.
580
+ # "info:ofi/fmt:xml:xsd" is not actually a legal format
581
+ # identifier, it should have more on the end.
582
+ # XPath should really end in "rft:*" for maximal generality, but
583
+ # REXML doesn't like that.
584
+ if (false && fmt_val && fmt_val == "info:ofi/fmt:xml:xsd")
585
+ metalib_evidence = REXML::XPath.first( node, "./ctx:metadata-by-val/ctx:metadata/rft:journal", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx", "rft"=>"info:ofi/fmt:xml:xsd:journal"})
586
+
587
+ # Okay, even if we don't have that one, do we have a REALLY bad one
588
+ # where Metalib puts an illegal namespace identifier in too?
589
+ metalib_evidence = REXML::XPath.first( node, "./ctx:metadata-by-val/ctx:metadata/rft:journal", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx", "rft"=>"info:ofi/fmt:xml:xsd"}) unless metalib_evidence
590
+
591
+ # metalib didn't advertise it properly, but it's really
592
+ # journal format.
593
+ fmt_val = "info:ofi/fmt:xml:xsd:journal" if metalib_evidence
594
+ end
595
+
596
+ if fmt_val
597
+ return OpenURL::ContextObjectEntityFactory.format(fmt_val)
598
+ else
599
+ return OpenURL::ContextObjectEntity.new
600
+ end
601
+ end
602
+
603
+ # Parses the data that should apply to all XML context objects
604
+ def import_xml_common(ent, node)
605
+
606
+ REXML::XPath.each(node, "./ctx:identifier", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"}) do | id |
607
+ ent.add_identifier(id.get_text.value) if id and id.has_text?
608
+ end
609
+
610
+ priv = REXML::XPath.first(node, "./ctx:private-data", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
611
+ ent.set_private_data(priv.get_text.value) if priv and priv.has_text?
612
+
613
+ ref = REXML::XPath.first(node, "./ctx:metadata-by-ref", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
614
+ if ref
615
+ reference = {}
616
+ ref.to_a.each do |r|
617
+ if r.name() == "format"
618
+ reference[:format] = r.get_text.value if r.get_text
619
+ else
620
+ reference[:location] = r.get_text.value
621
+ end
622
+ end
623
+ ent.set_reference(reference[:location], reference[:format])
624
+ end
625
+ end
626
+
627
+ # Pass in a REXML element representing an entity.
628
+ # Special weird workaround for info sent from metalib.
629
+ # Metalib uses "info:ofi/fmt:xml:xsd" as a format identifier, and
630
+ # sometimes even as a namespace identifier for a <journal> element.
631
+ # It's not legal for either. It messes up our parsing. The identifier
632
+ # should have something else on the end ":journal", ":book", etc.
633
+ # We tack ":journal" on the end if we find this unspecified
634
+ # but it contains a <journal> element.
635
+ # XPath should really end in "rft:*" for maximal generality, but
636
+ # REXML doesn't like that.
637
+ def metalib_workaround(node)
638
+ # Metalib fix
639
+ # Fix awful illegal Metalib XML
640
+ fmt = REXML::XPath.first(node, "./ctx:metadata-by-val/ctx:format", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
641
+ if ( fmt && fmt.text == "info:ofi/fmt:xml:xsd")
642
+ metadata_by_val = node.children.find {|e| e.respond_to?(:name) && e.name == 'metadata-by-val' }
643
+
644
+ # Find a "journal" element to make sure forcing to ":journal" is a good
645
+ # idea, and to later
646
+ # fix the journal namespace if needed
647
+ metadata = metadata_by_val.children.find {|e| e.respond_to?(:name) && e.name == 'metadata' } if metadata_by_val
648
+ journal = metadata.find {|e| e.respond_to?(:name) && e.name == 'journal' } if metadata
649
+
650
+ # Fix the format only if there's a <journal> element in there.
651
+ fmt = metadata_by_val.children.find {|e| e.respond_to?(:name) && e.name == 'format' } if metadata_by_val && journal
652
+ fmt.text = "info:ofi/fmt:xml:xsd:journal" if fmt
653
+
654
+ if (journal && journal.namespace == "info:ofi/fmt:xml:xsd")
655
+ journal.add_namespace("xmlns:rft", "info:ofi/fmt:xml:xsd:journal")
656
+ end
657
+ end
658
+ end
659
+
660
+ end
661
+
662
+
663
+ end
@@ -1,5 +1,7 @@
1
1
  # encoding: UTF-8
2
2
 
3
+ require 'ensure_valid_encoding'
4
+
3
5
  module OpenURL
4
6
 
5
7
  if RUBY_VERSION < '1.9'
@@ -34,7 +36,8 @@ module OpenURL
34
36
  # ctx.kev
35
37
  # ctx.xml
36
38
  class ContextObject
37
-
39
+ include EnsureValidEncoding
40
+
38
41
  attr_reader :admin, :referent, :referringEntity, :requestor, :referrer,
39
42
  :serviceType, :resolver
40
43
  attr_accessor :foreign_keys, :openurl_ver
@@ -236,7 +239,9 @@ module OpenURL
236
239
  # entities
237
240
 
238
241
  def import_xml(xml)
239
- if xml.is_a?(String)
242
+ if xml.is_a?(String)
243
+ xml.force_encoding("UTF-8") if xml.respond_to? :force_encoding
244
+ ensure_valid_encoding!(xml, :invalid => :replace)
240
245
  doc = REXML::Document.new xml.gsub(/>[\s\t]*\n*[\s\t]*</, '><').strip
241
246
  elsif xml.is_a?(REXML::Document)
242
247
  doc = xml
@@ -274,10 +279,47 @@ module OpenURL
274
279
  return co
275
280
  end
276
281
 
277
- # Imports an existing hash of ContextObject values and sets the appropriate
278
- # entities.
279
282
 
280
- def import_hash(hash)
283
+ # Takes a hash of openurl key/values, as output by CGI.parse
284
+ # from a query string for example (values can be strings or arrays
285
+ # of string). Mutates hash in place.
286
+ #
287
+ # Force encodes to UTF8 or 8859-1, depending on ctx_enc
288
+ # presence and value.
289
+ #
290
+ # Replaces any illegal bytes with replacement chars,
291
+ # transcodes to UTF-8 if needed to ensure UTF8 on way out.
292
+ def clean_char_encoding!(hash)
293
+ # Bail if we're not in ruby 1.9
294
+ return unless "".respond_to? :encoding
295
+
296
+ source_encoding = "UTF-8"
297
+ if hash["ctx_enc"] == "info:ofi/enc:ISO-8859-1"
298
+ hash.delete("ctx_enc")
299
+ source_encoding = "ISO-8859-1"
300
+ end
301
+
302
+ hash.each_pair do |key, values|
303
+ # get a list of all terminal values, whether wrapped
304
+ # in arrays or not. We're going to mutate them.
305
+ [values].flatten.each do | v |
306
+ v.force_encoding(source_encoding)
307
+ if source_encoding == "UTF-8"
308
+ ensure_valid_encoding!(v, :invalid => :replace, :undef => :replace )
309
+ else
310
+ # transcode, replacing any bad chars.
311
+ v.encode!("UTF-8", :invalid => :replace)
312
+ end
313
+ end
314
+ end
315
+
316
+ end
317
+
318
+ # Imports an existing hash of ContextObject values and sets the appropriate
319
+ # entities.
320
+ def import_hash(hash)
321
+ clean_char_encoding!(hash)
322
+
281
323
  ref = {}
282
324
  {"@referent"=>"rft", "@referrer"=>"rfr", "@referringEntity"=>"rfe",
283
325
  "@requestor"=>"req"}.each do | ent, abbr |
@@ -387,9 +429,10 @@ module OpenURL
387
429
  end
388
430
  end
389
431
 
390
- # Initialize a new ContextObject object from an existing key/value hash
432
+
391
433
 
392
- def self.new_from_hash(hash)
434
+ # Initialize a new ContextObject object from an existing key/value hash
435
+ def self.new_from_hash(hash)
393
436
  co = self.new
394
437
  co.import_hash(hash)
395
438
  return co
@@ -563,8 +606,7 @@ module OpenURL
563
606
 
564
607
  # Parses the data that should apply to all XML context objects
565
608
  def import_xml_common(ent, node)
566
-
567
-
609
+
568
610
  REXML::XPath.each(node, "./ctx:identifier", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"}) do | id |
569
611
  ent.add_identifier(id.get_text.value) if id and id.has_text?
570
612
  end
@@ -1,6 +1,5 @@
1
1
  # encoding: UTF-8
2
2
 
3
-
4
3
  require 'yaml'
5
4
 
6
5
  unless "".respond_to?(:encoding)
@@ -46,6 +45,67 @@ else
46
45
  assert_equal("UTF-8", ctx.xml.encoding.name)
47
46
  end
48
47
 
48
+ # includes byte that are bad for UTF8, OpenURL auto replaces em.
49
+ def test_bad_kev
50
+ raw_kev = "&rft.pub=M\xE9xico".force_encoding("ascii-8bit")
51
+
52
+ ctx = OpenURL::ContextObject.new_from_kev(raw_kev)
53
+
54
+
55
+ assert_equal("UTF-8", ctx.referent.metadata['pub'].encoding.name)
56
+ # replacement char
57
+ assert_equal "M\uFFFDxico", ctx.referent.metadata['pub']
58
+
59
+ # serialized as utf-8i
60
+ assert_equal("UTF-8", ctx.kev.encoding.name)
61
+
62
+ end
63
+
64
+ def test_bad_xml
65
+ ctx = OpenURL::ContextObject.new_from_xml(@@xml_with_bad_utf8)
66
+
67
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
68
+ assert_equal("D\uFFFDpendances et niveaux de représentation en syntaxe", ctx.referent.metadata["btitle"])
69
+
70
+ # serialized as utf-8
71
+ assert_equal("UTF-8", ctx.xml.encoding.name)
72
+
73
+ end
74
+
75
+ def test_bad_from_form_vars
76
+ ctx = OpenURL::ContextObject.new_from_form_vars("btitle" => "M\xE9xico".force_encoding("binary"))
77
+
78
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
79
+ assert_equal("M\uFFFDxico", ctx.referent.metadata["btitle"])
80
+
81
+ assert_equal("UTF-8", ctx.kev.encoding.name)
82
+ end
83
+
84
+ def test_8859_kev
85
+ # specify 8859-1 encoding.
86
+ raw_kev = "ctx_enc=info%3Aofi%2Fenc%3AISO-8859-1"
87
+ raw_kev += "&url_ver=Z39.88-2004&rft.btitle=M%E9xico".force_encoding("ascii-8bit")
88
+
89
+ ctx = OpenURL::ContextObject.new_from_kev(raw_kev)
90
+
91
+ # properly transcoded to UTF8
92
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
93
+ assert_equal("México".force_encoding("UTF-8"), ctx.referent.metadata["btitle"])
94
+
95
+ # serialized as utf-8
96
+ assert_equal("UTF-8", ctx.kev.encoding.name)
97
+ # with proper ctx_env, not the one previously specifying ISO-8859-1!
98
+
99
+ assert_not_equal "info:ofi/enc:ISO-8859-1", CGI.parse(ctx.kev)["ctx_enc"].first
100
+ end
101
+
102
+ def test_8859_form_vars
103
+ ctx = OpenURL::ContextObject.new_from_form_vars("btitle" => "M\xE9xico", "ctx_enc" => "info:ofi/enc:ISO-8859-1")
104
+
105
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
106
+ assert_equal("México".force_encoding("UTF-8"), ctx.referent.metadata["btitle"])
107
+ end
108
+
49
109
 
50
110
 
51
111
  @@xml_with_utf8 = <<eos
@@ -53,6 +113,12 @@ else
53
113
  eos
54
114
  # Make sure it's got a raw encoding, so we can test it winds up utf-8 anyhow
55
115
  @@xml_with_utf8.force_encoding("ascii-8bit")
116
+
117
+ @@xml_with_bad_utf8 = <<eos
118
+ <ctx:context-objects xmlns:ctx='info:ofi/fmt:xml:xsd:ctx' xsi:schemaLocation='info:ofi/fmt:xml:xsd:ctx http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><ctx:context-object identifier='10_8' timestamp='2003-04-11T10:08:30TZD' version='Z39.88-2004'><ctx:referent><ctx:metadata-by-val><ctx:format>info:ofi/fmt:xml:xsd:book</ctx:format><ctx:metadata><rft:book xmlns:rft='info:ofi/fmt:xml:xsd:book' xsi:schemaLocation='info:ofi/fmt:xml:xsd:book http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:book'><rft:genre>book</rft:genre><rft:btitle>D\xE9pendances et niveaux de représentation en syntaxe</rft:btitle><rft:date>1985</rft:date><rft:pub>Benjamins</rft:pub><rft:place>Amsterdam, Philadelphia</rft:place><rft:authors><rft:author><rft:aulast>Vergnaud</rft:aulast><rft:auinit>J.-R</rft:auinit></rft:author></rft:authors></rft:book></ctx:metadata></ctx:metadata-by-val></ctx:referent><ctx:referring-entity><ctx:metadata-by-val><ctx:format>info:ofi/fmt:xml:xsd:book</ctx:format><ctx:metadata><rfe:book xmlns:rfe='info:ofi/fmt:xml:xsd:book' xsi:schemaLocation='info:ofi/fmt:xml:xsd:book http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:book'><rfe:genre>book</rfe:genre><rfe:btitle>Minimalist Program</rfe:btitle><rfe:isbn>0262531283</rfe:isbn><rfe:date>1995</rfe:date><rfe:pub>The MIT Press</rfe:pub><rfe:place>Cambridge, Mass</rfe:place><rfe:authors><rfe:author><rfe:aulast>Chomsky</rfe:aulast><rfe:auinit>N</rfe:auinit></rfe:author></rfe:authors></rfe:book></ctx:metadata></ctx:metadata-by-val><ctx:identifier>urn:isbn:0262531283</ctx:identifier></ctx:referring-entity><ctx:referrer><ctx:identifier>info:sid/ebookco.com:bookreader</ctx:identifier></ctx:referrer><ctx:service-type><ctx:metadata-by-val><ctx:format>info:ofi/fmt:xml:xsd:sch_svc</ctx:format><ctx:metadata><svc:abstract xmlns:svc='info:ofi/fmt:xml:xsd:sch_svc'>yes</svc:abstract></ctx:metadata></ctx:metadata-by-val></ctx:service-type></ctx:context-object></ctx:context-objects>
119
+ eos
120
+ @@xml_with_bad_utf8.force_encoding("ascii-8bit")
121
+
56
122
 
57
123
  end
58
124
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: openurl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.4.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2011-10-10 00:00:00.000000000 Z
13
+ date: 2012-07-10 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: marc
@@ -28,6 +28,22 @@ dependencies:
28
28
  - - ! '>='
29
29
  - !ruby/object:Gem::Version
30
30
  version: '0'
31
+ - !ruby/object:Gem::Dependency
32
+ name: ensure_valid_encoding
33
+ requirement: !ruby/object:Gem::Requirement
34
+ none: false
35
+ requirements:
36
+ - - ! '>='
37
+ - !ruby/object:Gem::Version
38
+ version: '0'
39
+ type: :runtime
40
+ prerelease: false
41
+ version_requirements: !ruby/object:Gem::Requirement
42
+ none: false
43
+ requirements:
44
+ - - ! '>='
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
31
47
  description:
32
48
  email:
33
49
  - rochkind@jhu.edu
@@ -47,6 +63,7 @@ files:
47
63
  - lib/openurl/metadata_formats/marc.rb
48
64
  - lib/openurl/metadata_formats/patent.rb
49
65
  - lib/openurl/metadata_formats/dissertation.rb
66
+ - lib/openurl/context_object-SAVED
50
67
  - lib/openurl/transport.rb
51
68
  - lib/openurl/context_object.rb
52
69
  - test/data/metalib_sap2_post_params.yml
@@ -84,4 +101,14 @@ rubygems_version: 1.8.24
84
101
  signing_key:
85
102
  specification_version: 3
86
103
  summary: a Ruby library to create, parse and use NISO Z39.88 OpenURLs
87
- test_files: []
104
+ test_files:
105
+ - test/data/metalib_sap2_post_params.yml
106
+ - test/data/dc_ctx.xml
107
+ - test/data/yu.xml
108
+ - test/data/scholarly_au_ctx.xml
109
+ - test/data/marc_ctx.xml
110
+ - test/context_object_entity_test.rb
111
+ - test/test.yml
112
+ - test/context_object_test.rb
113
+ - test/encoding_test.rb
114
+ - test/scholarly_common_test.rb