onix 0.8.0 → 0.8.1

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -1,3 +1,16 @@
1
+ v0.8.1 (5th January 2010)
2
+ - Use libxml2's support for transparent entity conversion when reading an ONIX file
3
+ - Removed entity replacement from ONIX::Normaliser
4
+ - the external dependency on sed made me uncomfortable, and it wasn't really
5
+ necessary now that nokogiri can do it for us
6
+ - Removed utf-8 normalisation from ONIX::Normaliser
7
+ - nokogiri also handles this really cleanly and transparently. Regardless of
8
+ the source file encoding, Nokogiri::Reader returns utf-8 encoded data
9
+ - Add the release attribute to files we generate
10
+ - it's optional in 2.1, but mandatory in 3.0. As we start to see 3.0 files in the
11
+ wild it will help to have a rapid way to distinguish between them
12
+ - Add ONIX::Reader#release - to detect the release version of files we read in
13
+
1
14
  v0.8.0 (31st October 2009)
2
15
  - Replace LibXML dependency with Nokogiri. Nokogiri is under active development, has
3
16
  a responsive maintainer and is significantly more stable
@@ -10,6 +10,29 @@ and writing ONIX files in your ruby applications.
10
10
  This replaces the obsolete rbook-onix gem that was spectacular in its crapness.
11
11
  Let us never speak of it again.
12
12
 
13
+ ## Feature Support
14
+
15
+ This library currently only handles ONIX 2.1 files (all revisions). At some
16
+ point I'll need to work out what to do about supporting ONIX 3.0 files. I
17
+ suspect a separate library will be the simplest solution.
18
+
19
+ ONIX::Reader only handles the reference tag versions of ONIX 2.1. Use
20
+ ONIX::Normaliser to convert any short tag files to reference tags.
21
+
22
+ ONIX::Writer only generates reference tag ONIX files.
23
+
24
+ ## DTD Loading
25
+
26
+ To correctly handle named entities when reading an ONIX file, this gem attempts
27
+ to load the DTD describing the ONIX format into memory. By default, this means
28
+ each file you read will require several hundred Kb of data to be downloaded
29
+ over the net.
30
+
31
+ This is obviously not desirable in most cases. To avoid it, you need to add copies
32
+ of the ONIX DTDs into your system XML catalog. On Debian and Ubuntu systems,
33
+ the quickest way to do that is to build and install the package available @
34
+ http://github.com/yob/onix-dtd
35
+
13
36
  ## Installation
14
37
 
15
38
  gem install onix
@@ -36,5 +59,4 @@ To be honest, I'm not really expecting any, this is a niche library.
36
59
  ## Further Reading
37
60
 
38
61
  - The source: [http://github.com/yob/onix/tree/master](http://github.com/yob/onix/tree/master)
39
- - Rubyforge project: [http://rubyforge.org/projects/rbook/](http://rubyforge.org/projects/rbook/)
40
- - The official specs [http://www.editeur.org/onix.html](http://www.editeur.org/onix.html)
62
+ - The official specs [http://www.editeur.org/8/ONIX/](http://www.editeur.org/8/ONIX/)
@@ -9,7 +9,7 @@ module ONIX
9
9
  module Version #:nodoc:
10
10
  Major = 0
11
11
  Minor = 8
12
- Tiny = 0
12
+ Tiny = 1
13
13
 
14
14
  String = [Major, Minor, Tiny].join('.')
15
15
  end
@@ -0,0 +1,99 @@
1
+ # coding: utf-8
2
+
3
+ module ONIX
4
+ module Lists
5
+ # Code list 17
6
+ CONTRIBUTOR_ROLE = {
7
+ "A01" => "By (author)",
8
+ "A02" => "With",
9
+ "A03" => "Screenplay by",
10
+ "A04" => "Libretto by",
11
+ "A05" => "Lyrics by",
12
+ "A06" => "By (composer)",
13
+ "A07" => "By (artist)",
14
+ "A08" => "By (photographer)",
15
+ "A09" => "Created by",
16
+ "A10" => "From an idea by",
17
+ "A11" => "Designed by",
18
+ "A12" => "Illustrated by",
19
+ "A13" => "Photographs by",
20
+ "A14" => "Text by",
21
+ "A15" => "Preface by",
22
+ "A16" => "Prologue by",
23
+ "A17" => "Summary by",
24
+ "A18" => "Supplement by",
25
+ "A19" => "Afterword by",
26
+ "A20" => "Notes by",
27
+ "A21" => "Commentaries by",
28
+ "A22" => "Epilogue by",
29
+ "A23" => "Foreword by",
30
+ "A24" => "Introduction by",
31
+ "A25" => "Footnotes by",
32
+ "A26" => "Memoir by",
33
+ "A27" => "Experiments by",
34
+ "A29" => "Introduction and notes by",
35
+ "A30" => "Software written by",
36
+ "A31" => "Book and lyrics by",
37
+ "A32" => "Contributions by",
38
+ "A33" => "Appendix by",
39
+ "A34" => "Index by",
40
+ "A35" => "Drawings by",
41
+ "A36" => "Cover design or artwork by",
42
+ "A37" => "Preliminary work by",
43
+ "A38" => "Original author",
44
+ "A39" => "Maps by",
45
+ "A40" => "Inked or colored by",
46
+ "A41" => "Pop-ups by",
47
+ "A42" => "Continued by",
48
+ "A43" => "Interviewer",
49
+ "A44" => "Interviewee",
50
+ "A99" => "Other primary creator",
51
+ "B01" => "Edited by",
52
+ "B02" => "Revised by",
53
+ "B03" => "Retold by",
54
+ "B04" => "Abridged by",
55
+ "B05" => "Adapted by",
56
+ "B06" => "Translated by",
57
+ "B07" => "As told by",
58
+ "B08" => "Translated with commentary by",
59
+ "B09" => "Series edited by",
60
+ "B10" => "Edited and translated by",
61
+ "B11" => "Editor-in-chief",
62
+ "B12" => "Guest editor",
63
+ "B13" => "Volume editor",
64
+ "B14" => "Editorial board member",
65
+ "B15" => "Editorial coordination by",
66
+ "B16" => "Managing editor",
67
+ "B17" => "Founded by",
68
+ "B18" => "Prepared for publication by",
69
+ "B19" => "Associate editor",
70
+ "B20" => "Consultant editor",
71
+ "B21" => "General editor",
72
+ "B22" => "Dramatized by",
73
+ "B23" => "General rapporteur",
74
+ "B24" => "Literary editor",
75
+ "B25" => "Arranged by (music)",
76
+ "B99" => "Other adaptation by",
77
+ "C01" => "Compiled by",
78
+ "C02" => "Selected by",
79
+ "C99" => "Other compilation by",
80
+ "D01" => "Producer",
81
+ "D02" => "Director",
82
+ "D03" => "Conductor",
83
+ "D99" => "Other direction by",
84
+ "E01" => "Actor",
85
+ "E02" => "Dancer",
86
+ "E03" => "Narrator",
87
+ "E04" => "Commentator",
88
+ "E05" => "Vocal soloist",
89
+ "E06" => "Instrumental soloist",
90
+ "E07" => 'Read by',
91
+ "E08" => "Performed by (orchestra, band, ensemble)",
92
+ "E99" => "Performed by",
93
+ "F01" => "Filmed/photographed by",
94
+ "F99" => "Other recording by",
95
+ "Z01" => "Assisted by",
96
+ "Z99" => "Other",
97
+ }
98
+ end
99
+ end
@@ -40,9 +40,6 @@ module ONIX
40
40
  raise ArgumentError, "#{oldfile} does not exist" unless File.file?(oldfile)
41
41
  raise ArgumentError, "#{newfile} already exists" if File.file?(newfile)
42
42
  raise "xsltproc app not found" unless app_available?("xsltproc")
43
- raise "isutf8 app not found" unless app_available?("isutf8")
44
- raise "iconv app not found" unless app_available?("iconv")
45
- raise "sed app not found" unless app_available?("sed")
46
43
  raise "tr app not found" unless app_available?("tr")
47
44
 
48
45
  @oldfile = oldfile
@@ -60,21 +57,11 @@ module ONIX
60
57
  @curfile = dest
61
58
  end
62
59
 
63
- # convert to utf8
64
- dest = next_tempfile
65
- to_utf8(@curfile, dest)
66
- @curfile = dest
67
-
68
60
  # remove control chars
69
61
  dest = next_tempfile
70
62
  remove_control_chars(@curfile, dest)
71
63
  @curfile = dest
72
64
 
73
- # remove entities
74
- dest = next_tempfile
75
- replace_named_entities(@curfile, dest)
76
- @curfile = dest
77
-
78
65
  FileUtils.cp(@curfile, @newfile)
79
66
  end
80
67
 
@@ -110,41 +97,6 @@ module ONIX
110
97
  `xsltproc -o #{outpath} #{xsltpath} #{inpath}`
111
98
  end
112
99
 
113
- # ensure the file is valid utf8, then make sure it's declared as such.
114
- #
115
- # The following behaviour is expected:
116
- #
117
- # file is valid utf8, is marked correctly
118
- # - copied untouched
119
- # file is valid utf8, is marked incorrectly or has no marked encoding
120
- # - copied and encoding mark fixed or added
121
- # file is no utf8, encoding is marked
122
- # - file is converted to utf8 and enecoding mark is updated
123
- # file is not utf8, encoding is not marked
124
- # - file is copied untouched
125
- #
126
- def to_utf8(src, dest)
127
- inpath = File.expand_path(src)
128
- outpath = File.expand_path(dest)
129
-
130
- m, src_enc = *@head.match(/encoding=.([a-zA-Z0-9\-]+)./i)
131
-
132
- # ensure the file is actually utf8
133
- if `isutf8 #{inpath}`.strip == ""
134
- if src_enc.to_s.downcase == "utf-8"
135
- FileUtils.cp(inpath, outpath)
136
- else
137
- FileUtils.cp(inpath, outpath)
138
- `sed -i 's/<?xml.*?>/<?xml version=\"1.0\" encoding=\"UTF-8\"?>/g' #{outpath}`
139
- end
140
- elsif src_enc
141
- `iconv --from-code=#{src_enc} --to-code=UTF-8 #{inpath} > #{outpath}`
142
- `sed -i 's/#{src_enc}/UTF-8/' #{outpath}`
143
- else
144
- FileUtils.cp(inpath, outpath)
145
- end
146
- end
147
-
148
100
  # XML files shouldn't contain low ASCII control chars. Strip them.
149
101
  #
150
102
  def remove_control_chars(src, dest)
@@ -153,35 +105,6 @@ module ONIX
153
105
  `cat #{inpath} | tr -d "\\000-\\010\\013\\014\\016-\\037" > #{outpath}`
154
106
  end
155
107
 
156
- # replace all named entities in the specified file with
157
- # numeric entities.
158
- #
159
- def replace_named_entities(src, dest)
160
- inpath = File.expand_path(src)
161
- outpath = File.expand_path(dest)
162
-
163
- cmd = "sed " + entity_map.map do |named, numeric|
164
- "-e 's/\\&#{named};/\\&#{numeric};/g'"
165
- end.join(" ") + " #{inpath} > #{outpath}"
166
- #raise cmd
167
- `#{cmd}`
168
- end
169
-
170
- # return a named entity to numeric entity mapping, build by extracting
171
- # data from the ONIX DTD
172
- #
173
- def entity_map
174
- return @map if @map
175
-
176
- path = File.dirname(__FILE__) + "/../../support/entities.txt"
177
- @map = {}
178
- File.read(path).split.each do |line|
179
- elements = line.split(":")
180
- @map[elements.first] = elements.last
181
- end
182
- @map
183
- end
184
-
185
108
  end
186
109
 
187
110
  end
@@ -53,20 +53,21 @@ module ONIX
53
53
  class Reader
54
54
  include Enumerable
55
55
 
56
- attr_reader :header , :version, :xml_lang, :xml_version
56
+ attr_reader :header, :release
57
57
 
58
58
  def initialize(input, product_klass = ::ONIX::Product)
59
59
  if input.kind_of?(String)
60
60
  @file = File.open(input, "r")
61
- @reader = Nokogiri::XML::Reader.from_io(@file)
61
+ @reader = Nokogiri::XML::Reader(@file) { |cfg| cfg.dtdload.noent }
62
62
  elsif input.kind_of?(IO)
63
- @reader = Nokogiri::XML::Reader.from_io(input)
63
+ @reader = Nokogiri::XML::Reader(input) { |cfg| cfg.dtdload.noent }
64
64
  else
65
65
  raise ArgumentError, "Unable to read from file or IO stream"
66
66
  end
67
67
 
68
68
  @product_klass = product_klass
69
69
 
70
+ @release = find_release
70
71
  @header = find_header
71
72
 
72
73
  @xml_lang ||= @reader.lang
@@ -94,6 +95,23 @@ module ONIX
94
95
 
95
96
  private
96
97
 
98
+ def find_release
99
+ 2.times do
100
+ @reader.read
101
+ if @reader.node_type == 1 && @reader.name == "ONIXMessage"
102
+ value = @reader.attributes["release"]
103
+ if value
104
+ return BigDecimal.new(value)
105
+ else
106
+ return nil
107
+ end
108
+ elsif @reader.node_type == 14
109
+ return nil
110
+ end
111
+ end
112
+ return nil
113
+ end
114
+
97
115
  def find_header
98
116
  100.times do
99
117
  @reader.read
@@ -77,7 +77,7 @@ module ONIX
77
77
  def start_document
78
78
  @output.write("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n")
79
79
  @output.write("<!DOCTYPE ONIXMessage SYSTEM \"#{DOCTYPE}\">\n")
80
- @output.write("<ONIXMessage>\n")
80
+ @output.write("<ONIXMessage release=\"2.1\">\n")
81
81
  @output.write(@header.to_xml.to_s)
82
82
  @output.write("\n")
83
83
  end
@@ -25,79 +25,6 @@ context "ONIX::Normaliser", "with a simple short tag file" do
25
25
 
26
26
  end
27
27
 
28
- context "ONIX::Normaliser", "with an ISO-8859-1 file" do
29
-
30
- before(:each) do
31
- @data_path = File.join(File.dirname(__FILE__),"..","data")
32
- @filename = File.join(@data_path, "iso_8859_1.xml")
33
- @outfile = @filename + ".new"
34
- end
35
-
36
- after(:each) do
37
- File.unlink(@outfile) if File.file?(@outfile)
38
- end
39
-
40
- specify "should correctly convert an iso-8859-1 file to UTF-8" do
41
- ONIX::Normaliser.process(@filename, @outfile)
42
-
43
- File.file?(@outfile).should be_true
44
- content = File.read(@outfile)
45
-
46
- content.include?("ISO-8859-1").should be_false
47
- content.include?("UTF-8").should be_true
48
-
49
- `isutf8 #{File.expand_path(@outfile)}`.strip.should eql("")
50
- end
51
-
52
- end
53
-
54
- context "ONIX::Normaliser", "with an file using entities" do
55
-
56
- before(:each) do
57
- @data_path = File.join(File.dirname(__FILE__),"..","data")
58
- @filename = File.join(@data_path, "entities.xml")
59
- @outfile = @filename + ".new"
60
- end
61
-
62
- after(:each) do
63
- File.unlink(@outfile) if File.file?(@outfile)
64
- end
65
-
66
- specify "should correctly convert named entities to numeric entities" do
67
- ONIX::Normaliser.process(@filename, @outfile)
68
-
69
- File.file?(@outfile).should be_true
70
- content = File.read(@outfile)
71
-
72
- content.include?("&ndash;").should be_false
73
- content.include?("&#x02013;").should be_true
74
- end
75
- end
76
-
77
- context "ONIX::Normaliser", "with a utf8 file that has no declared encoding" do
78
-
79
- before(:each) do
80
- @data_path = File.join(File.dirname(__FILE__),"..","data")
81
- @filename = File.join(@data_path, "no_encoding.xml")
82
- @outfile = @filename + ".new"
83
- end
84
-
85
- after(:each) do
86
- File.unlink(@outfile) if File.file?(@outfile)
87
- end
88
-
89
- # this is to test for a bug where an exception was raised on files that
90
- # had no declared encoding
91
- specify "should add a utf-8 marker to the file" do
92
- ONIX::Normaliser.process(@filename, @outfile)
93
-
94
- File.file?(@outfile).should be_true
95
- content = File.read(@outfile)
96
-
97
- content.include?("encoding=\"UTF-8\"").should be_true
98
- end
99
- end
100
-
101
28
  context "ONIX::Normaliser", "with a utf8 file that has illegal control chars" do
102
29
 
103
30
  before(:each) do
@@ -110,8 +37,6 @@ context "ONIX::Normaliser", "with a utf8 file that has illegal control chars" do
110
37
  File.unlink(@outfile) if File.file?(@outfile)
111
38
  end
112
39
 
113
- # this is to test for a bug where an exception was raised on files that
114
- # had no declared encoding
115
40
  specify "should remove all control chars except LF, CR and TAB" do
116
41
  ONIX::Normaliser.process(@filename, @outfile)
117
42
 
@@ -5,13 +5,13 @@ require File.dirname(__FILE__) + '/spec_helper.rb'
5
5
  context "ONIX::Reader" do
6
6
 
7
7
  before(:each) do
8
- data_path = File.join(File.dirname(__FILE__),"..","data")
9
- @file1 = File.join(data_path, "9780194351898.xml")
10
- @file2 = File.join(data_path, "two_products.xml")
11
- @long_file = File.join(data_path, "Bookwise_July_2008.xml")
12
- @entity_file = File.join(data_path, "entities.xml")
13
- @utf_16_file = File.join(data_path, "utf_16.xml")
14
- @iso_8859_1_file = File.join(data_path, "iso_8859_1.xml")
8
+ @data_path = File.join(File.dirname(__FILE__),"..","data")
9
+ @file1 = File.join(@data_path, "9780194351898.xml")
10
+ @file2 = File.join(@data_path, "two_products.xml")
11
+ @long_file = File.join(@data_path, "Bookwise_July_2008.xml")
12
+ @entity_file = File.join(@data_path, "entities.xml")
13
+ @utf_16_file = File.join(@data_path, "utf_16.xml")
14
+ @iso_8859_1_file = File.join(@data_path, "iso_8859_1.xml")
15
15
  end
16
16
 
17
17
  specify "should initialize with a filename" do
@@ -26,16 +26,11 @@ context "ONIX::Reader" do
26
26
  end
27
27
  end
28
28
 
29
- # This is commented out as the code that I was using to read the ONIX version from the
30
- # input was causing segfaults and other stability issues
31
- specify "should provide access to various XML metadata from file"
32
- #do
33
- # reader = ONIX::Reader.new(@file1)
34
- # reader.encoding.should eql("utf-8")
35
- # reader.xml_lang.should eql(nil)
36
- # reader.xml_version.should eql(1.0)
37
- # reader.version.should eql([2,1,0])
38
- #end
29
+ specify "should provide access to various XML metadata from file" do
30
+ filename = File.join(@data_path, "reference_with_release_attrib.xml")
31
+ reader = ONIX::Reader.new(filename)
32
+ reader.release.should eql(BigDecimal.new("2.1"))
33
+ end
39
34
 
40
35
  specify "should provide access to the header in an ONIX file" do
41
36
  reader = ONIX::Reader.new(@file1)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: onix
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.0
4
+ version: 0.8.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - James Healy
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-10-31 00:00:00 +11:00
12
+ date: 2010-01-05 00:00:00 +11:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -73,6 +73,7 @@ files:
73
73
  - lib/onix/lists/product_availability.rb
74
74
  - lib/onix/lists/audience_code.rb
75
75
  - lib/onix/lists/language_code.rb
76
+ - lib/onix/lists/contributor_role.rb
76
77
  - lib/onix/market_representation.rb
77
78
  - lib/onix/audience_range.rb
78
79
  - lib/onix/product_identifier.rb