rexml 3.2.6 → 3.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of rexml might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2583ae302aa5e698f0887a689c416e5debe0533ac472a9f96fce6a8912040fd8
4
- data.tar.gz: b0ffa6301fd899969a78e060ccaeafebfc2169e3c63ff499ebc6170468866475
3
+ metadata.gz: 84b42219a4278ab15e7ee7627951d0b94dddc707cbf9563799b3266d02ed32db
4
+ data.tar.gz: 4895e6f04d100a2affc8d5c6af4c6dfec5ec4d0d863f8d22de1c66da1d253c61
5
5
  SHA512:
6
- metadata.gz: f63fb0b84ef51e790cc6310244f2106d8c47ec9a00687c58c743afda82b60be9986d503c6f56f947db06f6758707facccd03405c4d1009376e856080aa26d0e4
7
- data.tar.gz: db62bea7391837a7ab4cfc5cb5a412ed4deb8d232653ca66d93a323a5a76383eed520cd4ced5b20204f29b04e84678791cd6f807195868f5d4a5e519a73d2aaf
6
+ metadata.gz: 7729c31da310e2fb7c96cc3a5bd5b981fefdcdae6fe545bf2d113d91af5862fbb51789e9289b91e4247963169900b0cdccc373ffeea6ca3f935b2e32bab1e2e4
7
+ data.tar.gz: 542f689b7cd27b5c71aeb6845e5af2ac28186e31a98af8c45e984ce6ca563192b2a74e50b6acd95f1fde49ed6289bf9024bfd6612608455038a22e66c6b3a75b
data/NEWS.md CHANGED
@@ -1,5 +1,354 @@
1
1
  # News
2
2
 
3
+ ## 3.3.8 - 2024-09-29 {#version-3-3-8}
4
+
5
+ ### Improvements
6
+
7
+ * SAX2: Improve parse performance.
8
+ * GH-207
9
+ * Patch by NAITOH Jun.
10
+
11
+ ### Fixes
12
+
13
+ * Fixed a bug that unexpected attribute namespace conflict error for
14
+ the predefined "xml" namespace is reported.
15
+ * GH-208
16
+ * Patch by KITAITI Makoto
17
+
18
+ ### Thanks
19
+
20
+ * NAITOH Jun
21
+
22
+ * KITAITI Makoto
23
+
24
+ ## 3.3.7 - 2024-09-04 {#version-3-3-7}
25
+
26
+ ### Improvements
27
+
28
+ * Added local entity expansion limit methods
29
+ * GH-192
30
+ * GH-202
31
+ * Reported by takuya kodama.
32
+ * Patch by NAITOH Jun.
33
+
34
+ * Removed explicit strscan dependency
35
+ * GH-204
36
+ * Patch by Bo Anderson.
37
+
38
+ ### Thanks
39
+
40
+ * takuya kodama
41
+
42
+ * NAITOH Jun
43
+
44
+ * Bo Anderson
45
+
46
+ ## 3.3.6 - 2024-08-22 {#version-3-3-6}
47
+
48
+ ### Improvements
49
+
50
+ * Removed duplicated entity expansions for performance.
51
+ * GH-194
52
+ * Patch by Viktor Ivarsson.
53
+
54
+ * Improved namespace conflicted attribute check performance. It was
55
+ too slow for deep elements.
56
+ * Reported by l33thaxor.
57
+
58
+ ### Fixes
59
+
60
+ * Fixed a bug that default entity expansions are counted for
61
+ security check. Default entity expansions should not be counted
62
+ because they don't have a security risk.
63
+ * GH-198
64
+ * GH-199
65
+ * Patch Viktor Ivarsson
66
+
67
+ * Fixed a parser bug that parameter entity references in internal
68
+ subsets are expanded. It's not allowed in the XML specification.
69
+ * GH-191
70
+ * Patch by NAITOH Jun.
71
+
72
+ * Fixed a stream parser bug that user-defined entity references in
73
+ text aren't expanded.
74
+ * GH-200
75
+ * Patch by NAITOH Jun.
76
+
77
+ ### Thanks
78
+
79
+ * Viktor Ivarsson
80
+
81
+ * NAITOH Jun
82
+
83
+ * l33thaxor
84
+
85
+ ## 3.3.5 - 2024-08-12 {#version-3-3-5}
86
+
87
+ ### Fixes
88
+
89
+ * Fixed a bug that `REXML::Security.entity_expansion_text_limit`
90
+ check has wrong text size calculation in SAX and pull parsers.
91
+ * GH-193
92
+ * GH-195
93
+ * Reported by Viktor Ivarsson.
94
+ * Patch by NAITOH Jun.
95
+
96
+ ### Thanks
97
+
98
+ * Viktor Ivarsson
99
+
100
+ * NAITOH Jun
101
+
102
+ ## 3.3.4 - 2024-08-01 {#version-3-3-4}
103
+
104
+ ### Fixes
105
+
106
+ * Fixed a bug that `REXML::Security` isn't defined when
107
+ `REXML::Parsers::StreamParser` is used and
108
+ `rexml/parsers/streamparser` is only required.
109
+ * GH-189
110
+ * Patch by takuya kodama.
111
+
112
+ ### Thanks
113
+
114
+ * takuya kodama
115
+
116
+ ## 3.3.3 - 2024-08-01 {#version-3-3-3}
117
+
118
+ ### Improvements
119
+
120
+ * Added support for detecting invalid XML that has unsupported
121
+ content before root element
122
+ * GH-184
123
+ * Patch by NAITOH Jun.
124
+
125
+ * Added support for `REXML::Security.entity_expansion_limit=` and
126
+ `REXML::Security.entity_expansion_text_limit=` in SAX2 and pull
127
+ parsers
128
+ * GH-187
129
+ * Patch by NAITOH Jun.
130
+
131
+ * Added more tests for invalid XMLs.
132
+ * GH-183
133
+ * Patch by Watson.
134
+
135
+ * Added more performance tests.
136
+ * Patch by Watson.
137
+
138
+ * Improved parse performance.
139
+ * GH-186
140
+ * Patch by tomoya ishida.
141
+
142
+ ### Thanks
143
+
144
+ * NAITOH Jun
145
+
146
+ * Watson
147
+
148
+ * tomoya ishida
149
+
150
+ ## 3.3.2 - 2024-07-16 {#version-3-3-2}
151
+
152
+ ### Improvements
153
+
154
+ * Improved parse performance.
155
+ * GH-160
156
+ * Patch by NAITOH Jun.
157
+
158
+ * Improved parse performance.
159
+ * GH-169
160
+ * GH-170
161
+ * GH-171
162
+ * GH-172
163
+ * GH-173
164
+ * GH-174
165
+ * GH-175
166
+ * GH-176
167
+ * GH-177
168
+ * Patch by Watson.
169
+
170
+ * Added support for raising a parse exception when an XML has extra
171
+ content after the root element.
172
+ * GH-161
173
+ * Patch by NAITOH Jun.
174
+
175
+ * Added support for raising a parse exception when an XML
176
+ declaration exists in wrong position.
177
+ * GH-162
178
+ * Patch by NAITOH Jun.
179
+
180
+ * Removed needless a space after XML declaration in pretty print mode.
181
+ * GH-164
182
+ * Patch by NAITOH Jun.
183
+
184
+ * Stopped to emit `:text` event after the root element.
185
+ * GH-167
186
+ * Patch by NAITOH Jun.
187
+
188
+ ### Fixes
189
+
190
+ * Fixed a bug that SAX2 parser doesn't expand predefined entities for
191
+ `characters` callback.
192
+ * GH-168
193
+ * Patch by NAITOH Jun.
194
+
195
+ ### Thanks
196
+
197
+ * NAITOH Jun
198
+
199
+ * Watson
200
+
201
+ ## 3.3.1 - 2024-06-25 {#version-3-3-1}
202
+
203
+ ### Improvements
204
+
205
+ * Added support for detecting malformed top-level comments.
206
+ * GH-145
207
+ * Patch by Hiroya Fujinami.
208
+
209
+ * Improved `REXML::Element#attribute` performance.
210
+ * GH-146
211
+ * Patch by Hiroya Fujinami.
212
+
213
+ * Added support for detecting malformed `<!-->` comments.
214
+ * GH-147
215
+ * Patch by Hiroya Fujinami.
216
+
217
+ * Added support for detecting unclosed `DOCTYPE`.
218
+ * GH-152
219
+ * Patch by Hiroya Fujinami.
220
+
221
+ * Added `changlog_uri` metadata to gemspec.
222
+ * GH-156
223
+ * Patch by fynsta.
224
+
225
+ * Improved parse performance.
226
+ * GH-157
227
+ * GH-158
228
+ * Patch by NAITOH Jun.
229
+
230
+ ### Fixes
231
+
232
+ * Fixed a bug that large XML can't be parsed.
233
+ * GH-154
234
+ * Patch by NAITOH Jun.
235
+
236
+ * Fixed a bug that private constants are visible.
237
+ * GH-155
238
+ * Patch by NAITOH Jun.
239
+
240
+ ### Thanks
241
+
242
+ * Hiroya Fujinami
243
+
244
+ * NAITOH Jun
245
+
246
+ * fynsta
247
+
248
+ ## 3.3.0 - 2024-06-11 {#version-3-3-0}
249
+
250
+ ### Improvements
251
+
252
+ * Added support for strscan 0.7.0 installed with Ruby 2.6.
253
+ * GH-142
254
+ * Reported by Fernando Trigoso.
255
+
256
+ ### Thanks
257
+
258
+ * Fernando Trigoso
259
+
260
+ ## 3.2.9 - 2024-06-09 {#version-3-2-9}
261
+
262
+ ### Improvements
263
+
264
+ * Added support for old strscan.
265
+ * GH-132
266
+ * Reported by Adam.
267
+
268
+ * Improved attribute value parse performance.
269
+ * GH-135
270
+ * Patch by NAITOH Jun.
271
+
272
+ * Improved `REXML::Node#each_recursive` performance.
273
+ * GH-134
274
+ * GH-139
275
+ * Patch by Hiroya Fujinami.
276
+
277
+ * Improved text parse performance.
278
+ * Reported by mprogrammer.
279
+
280
+ ### Thanks
281
+
282
+ * Adam
283
+ * NAITOH Jun
284
+ * Hiroya Fujinami
285
+ * mprogrammer
286
+
287
+ ## 3.2.8 - 2024-05-16 {#version-3-2-8}
288
+
289
+ ### Fixes
290
+
291
+ * Suppressed a warning
292
+
293
+ ## 3.2.7 - 2024-05-16 {#version-3-2-7}
294
+
295
+ ### Improvements
296
+
297
+ * Improve parse performance by using `StringScanner`.
298
+
299
+ * GH-106
300
+ * GH-107
301
+ * GH-108
302
+ * GH-109
303
+ * GH-112
304
+ * GH-113
305
+ * GH-114
306
+ * GH-115
307
+ * GH-116
308
+ * GH-117
309
+ * GH-118
310
+ * GH-119
311
+ * GH-121
312
+
313
+ * Patch by NAITOH Jun.
314
+
315
+ * Improved parse performance when an attribute has many `<`s.
316
+
317
+ * GH-126
318
+
319
+ ### Fixes
320
+
321
+ * XPath: Fixed a bug of `normalize_space(array)`.
322
+
323
+ * GH-110
324
+ * GH-111
325
+
326
+ * Patch by flatisland.
327
+
328
+ * XPath: Fixed a bug that wrong position is used with nested path.
329
+
330
+ * GH-110
331
+ * GH-122
332
+
333
+ * Reported by jcavalieri.
334
+ * Patch by NAITOH Jun.
335
+
336
+ * Fixed a bug that an exception message can't be generated for
337
+ invalid encoding XML.
338
+
339
+ * GH-29
340
+ * GH-123
341
+
342
+ * Reported by DuKewu.
343
+ * Patch by NAITOH Jun.
344
+
345
+ ### Thanks
346
+
347
+ * NAITOH Jun
348
+ * flatisland
349
+ * jcavalieri
350
+ * DuKewu
351
+
3
352
  ## 3.2.6 - 2023-07-27 {#version-3-2-6}
4
353
 
5
354
  ### Improvements
@@ -148,8 +148,9 @@ module REXML
148
148
  # have been expanded to their values
149
149
  def value
150
150
  return @unnormalized if @unnormalized
151
- @unnormalized = Text::unnormalize( @normalized, doctype )
152
- @unnormalized
151
+
152
+ @unnormalized = Text::unnormalize(@normalized, doctype,
153
+ entity_expansion_text_limit: @element&.document&.entity_expansion_text_limit)
153
154
  end
154
155
 
155
156
  # The normalized value of this attribute. That is, the attribute with
@@ -91,6 +91,8 @@ module REXML
91
91
  #
92
92
  def initialize( source = nil, context = {} )
93
93
  @entity_expansion_count = 0
94
+ @entity_expansion_limit = Security.entity_expansion_limit
95
+ @entity_expansion_text_limit = Security.entity_expansion_text_limit
94
96
  super()
95
97
  @context = context
96
98
  return if source.nil?
@@ -431,10 +433,12 @@ module REXML
431
433
  end
432
434
 
433
435
  attr_reader :entity_expansion_count
436
+ attr_writer :entity_expansion_limit
437
+ attr_accessor :entity_expansion_text_limit
434
438
 
435
439
  def record_entity_expansion
436
440
  @entity_expansion_count += 1
437
- if @entity_expansion_count > Security.entity_expansion_limit
441
+ if @entity_expansion_count > @entity_expansion_limit
438
442
  raise "number of entity expansions exceeded, processing aborted."
439
443
  end
440
444
  end
data/lib/rexml/element.rb CHANGED
@@ -7,14 +7,6 @@ require_relative "xpath"
7
7
  require_relative "parseexception"
8
8
 
9
9
  module REXML
10
- # An implementation note about namespaces:
11
- # As we parse, when we find namespaces we put them in a hash and assign
12
- # them a unique ID. We then convert the namespace prefix for the node
13
- # to the unique ID. This makes namespace lookup much faster for the
14
- # cost of extra memory use. We save the namespace prefix for the
15
- # context node and convert it back when we write it.
16
- @@namespaces = {}
17
-
18
10
  # An \REXML::Element object represents an XML element.
19
11
  #
20
12
  # An element:
@@ -449,9 +441,14 @@ module REXML
449
441
  # Related: #root_node, #document.
450
442
  #
451
443
  def root
452
- return elements[1] if self.kind_of? Document
453
- return self if parent.kind_of? Document or parent.nil?
454
- return parent.root
444
+ target = self
445
+ while target
446
+ return target.elements[1] if target.kind_of? Document
447
+ parent = target.parent
448
+ return target if parent.kind_of? Document or parent.nil?
449
+ target = parent
450
+ end
451
+ nil
455
452
  end
456
453
 
457
454
  # :call-seq:
@@ -627,8 +624,12 @@ module REXML
627
624
  else
628
625
  prefix = "xmlns:#{prefix}" unless prefix[0,5] == 'xmlns'
629
626
  end
630
- ns = attributes[ prefix ]
631
- ns = parent.namespace(prefix) if ns.nil? and parent
627
+ ns = nil
628
+ target = self
629
+ while ns.nil? and target
630
+ ns = target.attributes[prefix]
631
+ target = target.parent
632
+ end
632
633
  ns = '' if ns.nil? and prefix == 'xmlns'
633
634
  return ns
634
635
  end
@@ -1284,16 +1285,11 @@ module REXML
1284
1285
  # document.root.attribute("x", "a") # => a:x='a:x'
1285
1286
  #
1286
1287
  def attribute( name, namespace=nil )
1287
- prefix = nil
1288
- if namespaces.respond_to? :key
1289
- prefix = namespaces.key(namespace) if namespace
1290
- else
1291
- prefix = namespaces.index(namespace) if namespace
1292
- end
1288
+ prefix = namespaces.key(namespace) if namespace
1293
1289
  prefix = nil if prefix == 'xmlns'
1294
1290
 
1295
1291
  ret_val =
1296
- attributes.get_attribute( "#{prefix ? prefix + ':' : ''}#{name}" )
1292
+ attributes.get_attribute( prefix ? "#{prefix}:#{name}" : name )
1297
1293
 
1298
1294
  return ret_val unless ret_val.nil?
1299
1295
  return nil if prefix.nil?
@@ -2388,17 +2384,6 @@ module REXML
2388
2384
  elsif old_attr.kind_of? Hash
2389
2385
  old_attr[value.prefix] = value
2390
2386
  elsif old_attr.prefix != value.prefix
2391
- # Check for conflicting namespaces
2392
- if value.prefix != "xmlns" and old_attr.prefix != "xmlns"
2393
- old_namespace = old_attr.namespace
2394
- new_namespace = value.namespace
2395
- if old_namespace == new_namespace
2396
- raise ParseException.new(
2397
- "Namespace conflict in adding attribute \"#{value.name}\": "+
2398
- "Prefix \"#{old_attr.prefix}\" = \"#{old_namespace}\" and "+
2399
- "prefix \"#{value.prefix}\" = \"#{new_namespace}\"")
2400
- end
2401
- end
2402
2387
  store value.name, {old_attr.prefix => old_attr,
2403
2388
  value.prefix => value}
2404
2389
  else
data/lib/rexml/entity.rb CHANGED
@@ -12,6 +12,7 @@ module REXML
12
12
  EXTERNALID = "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
13
13
  NDATADECL = "\\s+NDATA\\s+#{NAME}"
14
14
  PEREFERENCE = "%#{NAME};"
15
+ PEREFERENCE_RE = /#{PEREFERENCE}/um
15
16
  ENTITYVALUE = %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
16
17
  PEDEF = "(?:#{ENTITYVALUE}|#{EXTERNALID})"
17
18
  ENTITYDEF = "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
@@ -19,7 +20,7 @@ module REXML
19
20
  GEDECL = "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
20
21
  ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
21
22
 
22
- attr_reader :name, :external, :ref, :ndata, :pubid
23
+ attr_reader :name, :external, :ref, :ndata, :pubid, :value
23
24
 
24
25
  # Create a new entity. Simple entities can be constructed by passing a
25
26
  # name, value to the constructor; this creates a generic, plain entity
@@ -68,14 +69,14 @@ module REXML
68
69
  end
69
70
 
70
71
  # Evaluates to the unnormalized value of this entity; that is, replacing
71
- # all entities -- both %ent; and &ent; entities. This differs from
72
- # +value()+ in that +value+ only replaces %ent; entities.
72
+ # &ent; entities.
73
73
  def unnormalized
74
- document.record_entity_expansion unless document.nil?
75
- v = value()
76
- return nil if v.nil?
77
- @unnormalized = Text::unnormalize(v, parent)
78
- @unnormalized
74
+ document&.record_entity_expansion
75
+
76
+ return nil if @value.nil?
77
+
78
+ @unnormalized = Text::unnormalize(@value, parent,
79
+ entity_expansion_text_limit: document&.entity_expansion_text_limit)
79
80
  end
80
81
 
81
82
  #once :unnormalized
@@ -121,46 +122,6 @@ module REXML
121
122
  write rv
122
123
  rv
123
124
  end
124
-
125
- PEREFERENCE_RE = /#{PEREFERENCE}/um
126
- # Returns the value of this entity. At the moment, only internal entities
127
- # are processed. If the value contains internal references (IE,
128
- # %blah;), those are replaced with their values. IE, if the doctype
129
- # contains:
130
- # <!ENTITY % foo "bar">
131
- # <!ENTITY yada "nanoo %foo; nanoo>
132
- # then:
133
- # doctype.entity('yada').value #-> "nanoo bar nanoo"
134
- def value
135
- @resolved_value ||= resolve_value
136
- end
137
-
138
- def parent=(other)
139
- @resolved_value = nil
140
- super
141
- end
142
-
143
- private
144
- def resolve_value
145
- return nil if @value.nil?
146
- return @value unless @value.match?(PEREFERENCE_RE)
147
-
148
- matches = @value.scan(PEREFERENCE_RE)
149
- rv = @value.clone
150
- if @parent
151
- sum = 0
152
- matches.each do |entity_reference|
153
- entity_value = @parent.entity( entity_reference[0] )
154
- if sum + entity_value.bytesize > Security.entity_expansion_text_limit
155
- raise "entity expansion has grown too large"
156
- else
157
- sum += entity_value.bytesize
158
- end
159
- rv.gsub!( /%#{entity_reference.join};/um, entity_value )
160
- end
161
- end
162
- rv
163
- end
164
125
  end
165
126
 
166
127
  # This is a set of entity constants -- the ones defined in the XML
@@ -111,7 +111,7 @@ module REXML
111
111
  # itself, then we don't need a carriage return... which makes this
112
112
  # logic more complex.
113
113
  node.children.each { |child|
114
- next if child == node.children[-1] and child.instance_of?(Text)
114
+ next if child.instance_of?(Text)
115
115
  unless child == node.children[0] or child.instance_of?(Text) or
116
116
  (child == node.children[1] and !node.children[0].writethis)
117
117
  output << "\n"
@@ -262,11 +262,10 @@ module REXML
262
262
  string(string).length
263
263
  end
264
264
 
265
- # UNTESTED
266
265
  def Functions::normalize_space( string=nil )
267
266
  string = string(@@context[:node]) if string.nil?
268
267
  if string.kind_of? Array
269
- string.collect{|x| string.to_s.strip.gsub(/\s+/um, ' ') if string}
268
+ string.collect{|x| x.to_s.strip.gsub(/\s+/um, ' ') if x}
270
269
  else
271
270
  string.to_s.strip.gsub(/\s+/um, ' ')
272
271
  end
data/lib/rexml/node.rb CHANGED
@@ -52,10 +52,14 @@ module REXML
52
52
 
53
53
  # Visit all subnodes of +self+ recursively
54
54
  def each_recursive(&block) # :yields: node
55
- self.elements.each {|node|
56
- block.call(node)
57
- node.each_recursive(&block)
58
- }
55
+ stack = []
56
+ each { |child| stack.unshift child if child.node_type == :element }
57
+ until stack.empty?
58
+ child = stack.pop
59
+ yield child
60
+ n = stack.size
61
+ child.each { |grandchild| stack.insert n, grandchild if grandchild.node_type == :element }
62
+ end
59
63
  end
60
64
 
61
65
  # Find (and return) first subnode (recursively) for which the block
@@ -29,6 +29,7 @@ module REXML
29
29
  err << "\nLine: #{line}\n"
30
30
  err << "Position: #{position}\n"
31
31
  err << "Last 80 unconsumed characters:\n"
32
+ err.force_encoding("ASCII-8BIT")
32
33
  err << @source.buffer[0..80].force_encoding("ASCII-8BIT").gsub(/\n/, ' ')
33
34
  end
34
35