onix 0.8.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGELOG CHANGED
@@ -1,3 +1,16 @@
1
+ v0.8.1 (5th January 2010)
2
+ - Use libxml2's support for transparent entity conversion when reading an ONIX file
3
+ - Removed entity replacement from ONIX::Normaliser
4
+ - the external dependency on sed made me uncomfortable, and it wasn't really
5
+ necessary now that nokogiri can do it for us
6
+ - Removed utf-8 normalisation from ONIX::Normaliser
7
+ - nokogiri also handles this really cleanly and transparently. Regardless of
8
+ the source file encoding, Nokogiri::Reader returns utf-8 encoded data
9
+ - Add the release attribute to files we generate
10
+ - it's optional in 2.1, but mandatory in 3.0. As we start to see 3.0 files in the
11
+ wild it will help to have a rapid way to distinguish between them
12
+ - Add ONIX::Reader#release - to detect the release version of files we read in
13
+
1
14
  v0.8.0 (31st October 2009)
2
15
  - Replace LibXML dependency with Nokogiri. Nokogiri is under active development, has
3
16
  a responsive maintainer and is significantly more stable
@@ -10,6 +10,29 @@ and writing ONIX files in your ruby applications.
10
10
  This replaces the obsolete rbook-onix gem that was spectacular in its crapness.
11
11
  Let us never speak of it again.
12
12
 
13
+ ## Feature Support
14
+
15
+ This library currently only handles ONIX 2.1 files (all revisions). At some
16
+ point I'll need to work out what to do about supporting ONIX 3.0 files. I
17
+ suspect a separate library will be the simplest solution.
18
+
19
+ ONIX::Reader only handles the reference tag versions of ONIX 2.1. Use
20
+ ONIX::Normaliser to convert any short tag files to reference tags.
21
+
22
+ ONIX::Writer only generates reference tag ONIX files.
23
+
24
+ ## DTD Loading
25
+
26
+ To correctly handle named entities when reading an ONIX file, this gem attempts
27
+ to load the DTD describing the ONIX format into memory. By default, this means
28
+ each file you read will require several hundred Kb of data to be downloaded
29
+ over the net.
30
+
31
+ This is obviously not desirable in most cases. To avoid it, you need to add copies
32
+ of the ONIX DTDs into your system XML catalog. On Debian and Ubuntu systems,
33
+ the quickest way to do that is to build and install the package available @
34
+ http://github.com/yob/onix-dtd
35
+
13
36
  ## Installation
14
37
 
15
38
  gem install onix
@@ -36,5 +59,4 @@ To be honest, I'm not really expecting any, this is a niche library.
36
59
  ## Further Reading
37
60
 
38
61
  - The source: [http://github.com/yob/onix/tree/master](http://github.com/yob/onix/tree/master)
39
- - Rubyforge project: [http://rubyforge.org/projects/rbook/](http://rubyforge.org/projects/rbook/)
40
- - The official specs [http://www.editeur.org/onix.html](http://www.editeur.org/onix.html)
62
+ - The official specs [http://www.editeur.org/8/ONIX/](http://www.editeur.org/8/ONIX/)
@@ -9,7 +9,7 @@ module ONIX
9
9
  module Version #:nodoc:
10
10
  Major = 0
11
11
  Minor = 8
12
- Tiny = 0
12
+ Tiny = 1
13
13
 
14
14
  String = [Major, Minor, Tiny].join('.')
15
15
  end
@@ -0,0 +1,99 @@
1
+ # coding: utf-8
2
+
3
+ module ONIX
4
+ module Lists
5
+ # Code list 17
6
+ CONTRIBUTOR_ROLE = {
7
+ "A01" => "By (author)",
8
+ "A02" => "With",
9
+ "A03" => "Screenplay by",
10
+ "A04" => "Libretto by",
11
+ "A05" => "Lyrics by",
12
+ "A06" => "By (composer)",
13
+ "A07" => "By (artist)",
14
+ "A08" => "By (photographer)",
15
+ "A09" => "Created by",
16
+ "A10" => "From an idea by",
17
+ "A11" => "Designed by",
18
+ "A12" => "Illustrated by",
19
+ "A13" => "Photographs by",
20
+ "A14" => "Text by",
21
+ "A15" => "Preface by",
22
+ "A16" => "Prologue by",
23
+ "A17" => "Summary by",
24
+ "A18" => "Supplement by",
25
+ "A19" => "Afterword by",
26
+ "A20" => "Notes by",
27
+ "A21" => "Commentaries by",
28
+ "A22" => "Epilogue by",
29
+ "A23" => "Foreword by",
30
+ "A24" => "Introduction by",
31
+ "A25" => "Footnotes by",
32
+ "A26" => "Memoir by",
33
+ "A27" => "Experiments by",
34
+ "A29" => "Introduction and notes by",
35
+ "A30" => "Software written by",
36
+ "A31" => "Book and lyrics by",
37
+ "A32" => "Contributions by",
38
+ "A33" => "Appendix by",
39
+ "A34" => "Index by",
40
+ "A35" => "Drawings by",
41
+ "A36" => "Cover design or artwork by",
42
+ "A37" => "Preliminary work by",
43
+ "A38" => "Original author",
44
+ "A39" => "Maps by",
45
+ "A40" => "Inked or colored by",
46
+ "A41" => "Pop-ups by",
47
+ "A42" => "Continued by",
48
+ "A43" => "Interviewer",
49
+ "A44" => "Interviewee",
50
+ "A99" => "Other primary creator",
51
+ "B01" => "Edited by",
52
+ "B02" => "Revised by",
53
+ "B03" => "Retold by",
54
+ "B04" => "Abridged by",
55
+ "B05" => "Adapted by",
56
+ "B06" => "Translated by",
57
+ "B07" => "As told by",
58
+ "B08" => "Translated with commentary by",
59
+ "B09" => "Series edited by",
60
+ "B10" => "Edited and translated by",
61
+ "B11" => "Editor-in-chief",
62
+ "B12" => "Guest editor",
63
+ "B13" => "Volume editor",
64
+ "B14" => "Editorial board member",
65
+ "B15" => "Editorial coordination by",
66
+ "B16" => "Managing editor",
67
+ "B17" => "Founded by",
68
+ "B18" => "Prepared for publication by",
69
+ "B19" => "Associate editor",
70
+ "B20" => "Consultant editor",
71
+ "B21" => "General editor",
72
+ "B22" => "Dramatized by",
73
+ "B23" => "General rapporteur",
74
+ "B24" => "Literary editor",
75
+ "B25" => "Arranged by (music)",
76
+ "B99" => "Other adaptation by",
77
+ "C01" => "Compiled by",
78
+ "C02" => "Selected by",
79
+ "C99" => "Other compilation by",
80
+ "D01" => "Producer",
81
+ "D02" => "Director",
82
+ "D03" => "Conductor",
83
+ "D99" => "Other direction by",
84
+ "E01" => "Actor",
85
+ "E02" => "Dancer",
86
+ "E03" => "Narrator",
87
+ "E04" => "Commentator",
88
+ "E05" => "Vocal soloist",
89
+ "E06" => "Instrumental soloist",
90
+ "E07" => 'Read by',
91
+ "E08" => "Performed by (orchestra, band, ensemble)",
92
+ "E99" => "Performed by",
93
+ "F01" => "Filmed/photographed by",
94
+ "F99" => "Other recording by",
95
+ "Z01" => "Assisted by",
96
+ "Z99" => "Other",
97
+ }
98
+ end
99
+ end
@@ -40,9 +40,6 @@ module ONIX
40
40
  raise ArgumentError, "#{oldfile} does not exist" unless File.file?(oldfile)
41
41
  raise ArgumentError, "#{newfile} already exists" if File.file?(newfile)
42
42
  raise "xsltproc app not found" unless app_available?("xsltproc")
43
- raise "isutf8 app not found" unless app_available?("isutf8")
44
- raise "iconv app not found" unless app_available?("iconv")
45
- raise "sed app not found" unless app_available?("sed")
46
43
  raise "tr app not found" unless app_available?("tr")
47
44
 
48
45
  @oldfile = oldfile
@@ -60,21 +57,11 @@ module ONIX
60
57
  @curfile = dest
61
58
  end
62
59
 
63
- # convert to utf8
64
- dest = next_tempfile
65
- to_utf8(@curfile, dest)
66
- @curfile = dest
67
-
68
60
  # remove control chars
69
61
  dest = next_tempfile
70
62
  remove_control_chars(@curfile, dest)
71
63
  @curfile = dest
72
64
 
73
- # remove entities
74
- dest = next_tempfile
75
- replace_named_entities(@curfile, dest)
76
- @curfile = dest
77
-
78
65
  FileUtils.cp(@curfile, @newfile)
79
66
  end
80
67
 
@@ -110,41 +97,6 @@ module ONIX
110
97
  `xsltproc -o #{outpath} #{xsltpath} #{inpath}`
111
98
  end
112
99
 
113
- # ensure the file is valid utf8, then make sure it's declared as such.
114
- #
115
- # The following behaviour is expected:
116
- #
117
- # file is valid utf8, is marked correctly
118
- # - copied untouched
119
- # file is valid utf8, is marked incorrectly or has no marked encoding
120
- # - copied and encoding mark fixed or added
121
- # file is no utf8, encoding is marked
122
- # - file is converted to utf8 and enecoding mark is updated
123
- # file is not utf8, encoding is not marked
124
- # - file is copied untouched
125
- #
126
- def to_utf8(src, dest)
127
- inpath = File.expand_path(src)
128
- outpath = File.expand_path(dest)
129
-
130
- m, src_enc = *@head.match(/encoding=.([a-zA-Z0-9\-]+)./i)
131
-
132
- # ensure the file is actually utf8
133
- if `isutf8 #{inpath}`.strip == ""
134
- if src_enc.to_s.downcase == "utf-8"
135
- FileUtils.cp(inpath, outpath)
136
- else
137
- FileUtils.cp(inpath, outpath)
138
- `sed -i 's/<?xml.*?>/<?xml version=\"1.0\" encoding=\"UTF-8\"?>/g' #{outpath}`
139
- end
140
- elsif src_enc
141
- `iconv --from-code=#{src_enc} --to-code=UTF-8 #{inpath} > #{outpath}`
142
- `sed -i 's/#{src_enc}/UTF-8/' #{outpath}`
143
- else
144
- FileUtils.cp(inpath, outpath)
145
- end
146
- end
147
-
148
100
  # XML files shouldn't contain low ASCII control chars. Strip them.
149
101
  #
150
102
  def remove_control_chars(src, dest)
@@ -153,35 +105,6 @@ module ONIX
153
105
  `cat #{inpath} | tr -d "\\000-\\010\\013\\014\\016-\\037" > #{outpath}`
154
106
  end
155
107
 
156
- # replace all named entities in the specified file with
157
- # numeric entities.
158
- #
159
- def replace_named_entities(src, dest)
160
- inpath = File.expand_path(src)
161
- outpath = File.expand_path(dest)
162
-
163
- cmd = "sed " + entity_map.map do |named, numeric|
164
- "-e 's/\\&#{named};/\\&#{numeric};/g'"
165
- end.join(" ") + " #{inpath} > #{outpath}"
166
- #raise cmd
167
- `#{cmd}`
168
- end
169
-
170
- # return a named entity to numeric entity mapping, build by extracting
171
- # data from the ONIX DTD
172
- #
173
- def entity_map
174
- return @map if @map
175
-
176
- path = File.dirname(__FILE__) + "/../../support/entities.txt"
177
- @map = {}
178
- File.read(path).split.each do |line|
179
- elements = line.split(":")
180
- @map[elements.first] = elements.last
181
- end
182
- @map
183
- end
184
-
185
108
  end
186
109
 
187
110
  end
@@ -53,20 +53,21 @@ module ONIX
53
53
  class Reader
54
54
  include Enumerable
55
55
 
56
- attr_reader :header , :version, :xml_lang, :xml_version
56
+ attr_reader :header, :release
57
57
 
58
58
  def initialize(input, product_klass = ::ONIX::Product)
59
59
  if input.kind_of?(String)
60
60
  @file = File.open(input, "r")
61
- @reader = Nokogiri::XML::Reader.from_io(@file)
61
+ @reader = Nokogiri::XML::Reader(@file) { |cfg| cfg.dtdload.noent }
62
62
  elsif input.kind_of?(IO)
63
- @reader = Nokogiri::XML::Reader.from_io(input)
63
+ @reader = Nokogiri::XML::Reader(input) { |cfg| cfg.dtdload.noent }
64
64
  else
65
65
  raise ArgumentError, "Unable to read from file or IO stream"
66
66
  end
67
67
 
68
68
  @product_klass = product_klass
69
69
 
70
+ @release = find_release
70
71
  @header = find_header
71
72
 
72
73
  @xml_lang ||= @reader.lang
@@ -94,6 +95,23 @@ module ONIX
94
95
 
95
96
  private
96
97
 
98
+ def find_release
99
+ 2.times do
100
+ @reader.read
101
+ if @reader.node_type == 1 && @reader.name == "ONIXMessage"
102
+ value = @reader.attributes["release"]
103
+ if value
104
+ return BigDecimal.new(value)
105
+ else
106
+ return nil
107
+ end
108
+ elsif @reader.node_type == 14
109
+ return nil
110
+ end
111
+ end
112
+ return nil
113
+ end
114
+
97
115
  def find_header
98
116
  100.times do
99
117
  @reader.read
@@ -77,7 +77,7 @@ module ONIX
77
77
  def start_document
78
78
  @output.write("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n")
79
79
  @output.write("<!DOCTYPE ONIXMessage SYSTEM \"#{DOCTYPE}\">\n")
80
- @output.write("<ONIXMessage>\n")
80
+ @output.write("<ONIXMessage release=\"2.1\">\n")
81
81
  @output.write(@header.to_xml.to_s)
82
82
  @output.write("\n")
83
83
  end
@@ -25,79 +25,6 @@ context "ONIX::Normaliser", "with a simple short tag file" do
25
25
 
26
26
  end
27
27
 
28
- context "ONIX::Normaliser", "with an ISO-8859-1 file" do
29
-
30
- before(:each) do
31
- @data_path = File.join(File.dirname(__FILE__),"..","data")
32
- @filename = File.join(@data_path, "iso_8859_1.xml")
33
- @outfile = @filename + ".new"
34
- end
35
-
36
- after(:each) do
37
- File.unlink(@outfile) if File.file?(@outfile)
38
- end
39
-
40
- specify "should correctly convert an iso-8859-1 file to UTF-8" do
41
- ONIX::Normaliser.process(@filename, @outfile)
42
-
43
- File.file?(@outfile).should be_true
44
- content = File.read(@outfile)
45
-
46
- content.include?("ISO-8859-1").should be_false
47
- content.include?("UTF-8").should be_true
48
-
49
- `isutf8 #{File.expand_path(@outfile)}`.strip.should eql("")
50
- end
51
-
52
- end
53
-
54
- context "ONIX::Normaliser", "with an file using entities" do
55
-
56
- before(:each) do
57
- @data_path = File.join(File.dirname(__FILE__),"..","data")
58
- @filename = File.join(@data_path, "entities.xml")
59
- @outfile = @filename + ".new"
60
- end
61
-
62
- after(:each) do
63
- File.unlink(@outfile) if File.file?(@outfile)
64
- end
65
-
66
- specify "should correctly convert named entities to numeric entities" do
67
- ONIX::Normaliser.process(@filename, @outfile)
68
-
69
- File.file?(@outfile).should be_true
70
- content = File.read(@outfile)
71
-
72
- content.include?("&ndash;").should be_false
73
- content.include?("&#x02013;").should be_true
74
- end
75
- end
76
-
77
- context "ONIX::Normaliser", "with a utf8 file that has no declared encoding" do
78
-
79
- before(:each) do
80
- @data_path = File.join(File.dirname(__FILE__),"..","data")
81
- @filename = File.join(@data_path, "no_encoding.xml")
82
- @outfile = @filename + ".new"
83
- end
84
-
85
- after(:each) do
86
- File.unlink(@outfile) if File.file?(@outfile)
87
- end
88
-
89
- # this is to test for a bug where an exception was raised on files that
90
- # had no declared encoding
91
- specify "should add a utf-8 marker to the file" do
92
- ONIX::Normaliser.process(@filename, @outfile)
93
-
94
- File.file?(@outfile).should be_true
95
- content = File.read(@outfile)
96
-
97
- content.include?("encoding=\"UTF-8\"").should be_true
98
- end
99
- end
100
-
101
28
  context "ONIX::Normaliser", "with a utf8 file that has illegal control chars" do
102
29
 
103
30
  before(:each) do
@@ -110,8 +37,6 @@ context "ONIX::Normaliser", "with a utf8 file that has illegal control chars" do
110
37
  File.unlink(@outfile) if File.file?(@outfile)
111
38
  end
112
39
 
113
- # this is to test for a bug where an exception was raised on files that
114
- # had no declared encoding
115
40
  specify "should remove all control chars except LF, CR and TAB" do
116
41
  ONIX::Normaliser.process(@filename, @outfile)
117
42
 
@@ -5,13 +5,13 @@ require File.dirname(__FILE__) + '/spec_helper.rb'
5
5
  context "ONIX::Reader" do
6
6
 
7
7
  before(:each) do
8
- data_path = File.join(File.dirname(__FILE__),"..","data")
9
- @file1 = File.join(data_path, "9780194351898.xml")
10
- @file2 = File.join(data_path, "two_products.xml")
11
- @long_file = File.join(data_path, "Bookwise_July_2008.xml")
12
- @entity_file = File.join(data_path, "entities.xml")
13
- @utf_16_file = File.join(data_path, "utf_16.xml")
14
- @iso_8859_1_file = File.join(data_path, "iso_8859_1.xml")
8
+ @data_path = File.join(File.dirname(__FILE__),"..","data")
9
+ @file1 = File.join(@data_path, "9780194351898.xml")
10
+ @file2 = File.join(@data_path, "two_products.xml")
11
+ @long_file = File.join(@data_path, "Bookwise_July_2008.xml")
12
+ @entity_file = File.join(@data_path, "entities.xml")
13
+ @utf_16_file = File.join(@data_path, "utf_16.xml")
14
+ @iso_8859_1_file = File.join(@data_path, "iso_8859_1.xml")
15
15
  end
16
16
 
17
17
  specify "should initialize with a filename" do
@@ -26,16 +26,11 @@ context "ONIX::Reader" do
26
26
  end
27
27
  end
28
28
 
29
- # This is commented out as the code that I was using to read the ONIX version from the
30
- # input was causing segfaults and other stability issues
31
- specify "should provide access to various XML metadata from file"
32
- #do
33
- # reader = ONIX::Reader.new(@file1)
34
- # reader.encoding.should eql("utf-8")
35
- # reader.xml_lang.should eql(nil)
36
- # reader.xml_version.should eql(1.0)
37
- # reader.version.should eql([2,1,0])
38
- #end
29
+ specify "should provide access to various XML metadata from file" do
30
+ filename = File.join(@data_path, "reference_with_release_attrib.xml")
31
+ reader = ONIX::Reader.new(filename)
32
+ reader.release.should eql(BigDecimal.new("2.1"))
33
+ end
39
34
 
40
35
  specify "should provide access to the header in an ONIX file" do
41
36
  reader = ONIX::Reader.new(@file1)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: onix
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.0
4
+ version: 0.8.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - James Healy
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-10-31 00:00:00 +11:00
12
+ date: 2010-01-05 00:00:00 +11:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -73,6 +73,7 @@ files:
73
73
  - lib/onix/lists/product_availability.rb
74
74
  - lib/onix/lists/audience_code.rb
75
75
  - lib/onix/lists/language_code.rb
76
+ - lib/onix/lists/contributor_role.rb
76
77
  - lib/onix/market_representation.rb
77
78
  - lib/onix/audience_range.rb
78
79
  - lib/onix/product_identifier.rb