rexml 3.3.1 → 3.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of rexml might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: afaa8e7d5241253a1c36a218f94eeff525cc19378d2ed104f738abfc01693889
4
- data.tar.gz: 665e18c0db75cce5e3db16c674c02e986ff9141df54fd7ff3da704b4403a928d
3
+ metadata.gz: 84b42219a4278ab15e7ee7627951d0b94dddc707cbf9563799b3266d02ed32db
4
+ data.tar.gz: 4895e6f04d100a2affc8d5c6af4c6dfec5ec4d0d863f8d22de1c66da1d253c61
5
5
  SHA512:
6
- metadata.gz: 86ea7a0ce4847b320f297b1eb03158003c2931847c07ea118f0a7413f476660dcf40baec8b59a92a2e7096eb665ace359b04c5d8e82617b7162305465472c88d
7
- data.tar.gz: ae248f28516ab6c76170623bcc5e5a30389596823133fd0a13cb74235d6101dd469235bab8b1e15bcbd7a7795f04b44e4674dfdcb1712109dce58001cea01648
6
+ metadata.gz: 7729c31da310e2fb7c96cc3a5bd5b981fefdcdae6fe545bf2d113d91af5862fbb51789e9289b91e4247963169900b0cdccc373ffeea6ca3f935b2e32bab1e2e4
7
+ data.tar.gz: 542f689b7cd27b5c71aeb6845e5af2ac28186e31a98af8c45e984ce6ca563192b2a74e50b6acd95f1fde49ed6289bf9024bfd6612608455038a22e66c6b3a75b
data/NEWS.md CHANGED
@@ -1,5 +1,203 @@
1
1
  # News
2
2
 
3
+ ## 3.3.8 - 2024-09-29 {#version-3-3-8}
4
+
5
+ ### Improvements
6
+
7
+ * SAX2: Improve parse performance.
8
+ * GH-207
9
+ * Patch by NAITOH Jun.
10
+
11
+ ### Fixes
12
+
13
+ * Fixed a bug that unexpected attribute namespace conflict error for
14
+ the predefined "xml" namespace is reported.
15
+ * GH-208
16
+ * Patch by KITAITI Makoto
17
+
18
+ ### Thanks
19
+
20
+ * NAITOH Jun
21
+
22
+ * KITAITI Makoto
23
+
24
+ ## 3.3.7 - 2024-09-04 {#version-3-3-7}
25
+
26
+ ### Improvements
27
+
28
+ * Added local entity expansion limit methods
29
+ * GH-192
30
+ * GH-202
31
+ * Reported by takuya kodama.
32
+ * Patch by NAITOH Jun.
33
+
34
+ * Removed explicit strscan dependency
35
+ * GH-204
36
+ * Patch by Bo Anderson.
37
+
38
+ ### Thanks
39
+
40
+ * takuya kodama
41
+
42
+ * NAITOH Jun
43
+
44
+ * Bo Anderson
45
+
46
+ ## 3.3.6 - 2024-08-22 {#version-3-3-6}
47
+
48
+ ### Improvements
49
+
50
+ * Removed duplicated entity expansions for performance.
51
+ * GH-194
52
+ * Patch by Viktor Ivarsson.
53
+
54
+ * Improved namespace conflicted attribute check performance. It was
55
+ too slow for deep elements.
56
+ * Reported by l33thaxor.
57
+
58
+ ### Fixes
59
+
60
+ * Fixed a bug that default entity expansions are counted for
61
+ security check. Default entity expansions should not be counted
62
+ because they don't have a security risk.
63
+ * GH-198
64
+ * GH-199
65
+ * Patch Viktor Ivarsson
66
+
67
+ * Fixed a parser bug that parameter entity references in internal
68
+ subsets are expanded. It's not allowed in the XML specification.
69
+ * GH-191
70
+ * Patch by NAITOH Jun.
71
+
72
+ * Fixed a stream parser bug that user-defined entity references in
73
+ text aren't expanded.
74
+ * GH-200
75
+ * Patch by NAITOH Jun.
76
+
77
+ ### Thanks
78
+
79
+ * Viktor Ivarsson
80
+
81
+ * NAITOH Jun
82
+
83
+ * l33thaxor
84
+
85
+ ## 3.3.5 - 2024-08-12 {#version-3-3-5}
86
+
87
+ ### Fixes
88
+
89
+ * Fixed a bug that `REXML::Security.entity_expansion_text_limit`
90
+ check has wrong text size calculation in SAX and pull parsers.
91
+ * GH-193
92
+ * GH-195
93
+ * Reported by Viktor Ivarsson.
94
+ * Patch by NAITOH Jun.
95
+
96
+ ### Thanks
97
+
98
+ * Viktor Ivarsson
99
+
100
+ * NAITOH Jun
101
+
102
+ ## 3.3.4 - 2024-08-01 {#version-3-3-4}
103
+
104
+ ### Fixes
105
+
106
+ * Fixed a bug that `REXML::Security` isn't defined when
107
+ `REXML::Parsers::StreamParser` is used and
108
+ `rexml/parsers/streamparser` is only required.
109
+ * GH-189
110
+ * Patch by takuya kodama.
111
+
112
+ ### Thanks
113
+
114
+ * takuya kodama
115
+
116
+ ## 3.3.3 - 2024-08-01 {#version-3-3-3}
117
+
118
+ ### Improvements
119
+
120
+ * Added support for detecting invalid XML that has unsupported
121
+ content before root element
122
+ * GH-184
123
+ * Patch by NAITOH Jun.
124
+
125
+ * Added support for `REXML::Security.entity_expansion_limit=` and
126
+ `REXML::Security.entity_expansion_text_limit=` in SAX2 and pull
127
+ parsers
128
+ * GH-187
129
+ * Patch by NAITOH Jun.
130
+
131
+ * Added more tests for invalid XMLs.
132
+ * GH-183
133
+ * Patch by Watson.
134
+
135
+ * Added more performance tests.
136
+ * Patch by Watson.
137
+
138
+ * Improved parse performance.
139
+ * GH-186
140
+ * Patch by tomoya ishida.
141
+
142
+ ### Thanks
143
+
144
+ * NAITOH Jun
145
+
146
+ * Watson
147
+
148
+ * tomoya ishida
149
+
150
+ ## 3.3.2 - 2024-07-16 {#version-3-3-2}
151
+
152
+ ### Improvements
153
+
154
+ * Improved parse performance.
155
+ * GH-160
156
+ * Patch by NAITOH Jun.
157
+
158
+ * Improved parse performance.
159
+ * GH-169
160
+ * GH-170
161
+ * GH-171
162
+ * GH-172
163
+ * GH-173
164
+ * GH-174
165
+ * GH-175
166
+ * GH-176
167
+ * GH-177
168
+ * Patch by Watson.
169
+
170
+ * Added support for raising a parse exception when an XML has extra
171
+ content after the root element.
172
+ * GH-161
173
+ * Patch by NAITOH Jun.
174
+
175
+ * Added support for raising a parse exception when an XML
176
+ declaration exists in wrong position.
177
+ * GH-162
178
+ * Patch by NAITOH Jun.
179
+
180
+ * Removed needless a space after XML declaration in pretty print mode.
181
+ * GH-164
182
+ * Patch by NAITOH Jun.
183
+
184
+ * Stopped to emit `:text` event after the root element.
185
+ * GH-167
186
+ * Patch by NAITOH Jun.
187
+
188
+ ### Fixes
189
+
190
+ * Fixed a bug that SAX2 parser doesn't expand predefined entities for
191
+ `characters` callback.
192
+ * GH-168
193
+ * Patch by NAITOH Jun.
194
+
195
+ ### Thanks
196
+
197
+ * NAITOH Jun
198
+
199
+ * Watson
200
+
3
201
  ## 3.3.1 - 2024-06-25 {#version-3-3-1}
4
202
 
5
203
  ### Improvements
@@ -148,8 +148,9 @@ module REXML
148
148
  # have been expanded to their values
149
149
  def value
150
150
  return @unnormalized if @unnormalized
151
- @unnormalized = Text::unnormalize( @normalized, doctype )
152
- @unnormalized
151
+
152
+ @unnormalized = Text::unnormalize(@normalized, doctype,
153
+ entity_expansion_text_limit: @element&.document&.entity_expansion_text_limit)
153
154
  end
154
155
 
155
156
  # The normalized value of this attribute. That is, the attribute with
@@ -91,6 +91,8 @@ module REXML
91
91
  #
92
92
  def initialize( source = nil, context = {} )
93
93
  @entity_expansion_count = 0
94
+ @entity_expansion_limit = Security.entity_expansion_limit
95
+ @entity_expansion_text_limit = Security.entity_expansion_text_limit
94
96
  super()
95
97
  @context = context
96
98
  return if source.nil?
@@ -431,10 +433,12 @@ module REXML
431
433
  end
432
434
 
433
435
  attr_reader :entity_expansion_count
436
+ attr_writer :entity_expansion_limit
437
+ attr_accessor :entity_expansion_text_limit
434
438
 
435
439
  def record_entity_expansion
436
440
  @entity_expansion_count += 1
437
- if @entity_expansion_count > Security.entity_expansion_limit
441
+ if @entity_expansion_count > @entity_expansion_limit
438
442
  raise "number of entity expansions exceeded, processing aborted."
439
443
  end
440
444
  end
data/lib/rexml/element.rb CHANGED
@@ -441,9 +441,14 @@ module REXML
441
441
  # Related: #root_node, #document.
442
442
  #
443
443
  def root
444
- return elements[1] if self.kind_of? Document
445
- return self if parent.kind_of? Document or parent.nil?
446
- return parent.root
444
+ target = self
445
+ while target
446
+ return target.elements[1] if target.kind_of? Document
447
+ parent = target.parent
448
+ return target if parent.kind_of? Document or parent.nil?
449
+ target = parent
450
+ end
451
+ nil
447
452
  end
448
453
 
449
454
  # :call-seq:
@@ -619,8 +624,12 @@ module REXML
619
624
  else
620
625
  prefix = "xmlns:#{prefix}" unless prefix[0,5] == 'xmlns'
621
626
  end
622
- ns = attributes[ prefix ]
623
- ns = parent.namespace(prefix) if ns.nil? and parent
627
+ ns = nil
628
+ target = self
629
+ while ns.nil? and target
630
+ ns = target.attributes[prefix]
631
+ target = target.parent
632
+ end
624
633
  ns = '' if ns.nil? and prefix == 'xmlns'
625
634
  return ns
626
635
  end
@@ -2375,17 +2384,6 @@ module REXML
2375
2384
  elsif old_attr.kind_of? Hash
2376
2385
  old_attr[value.prefix] = value
2377
2386
  elsif old_attr.prefix != value.prefix
2378
- # Check for conflicting namespaces
2379
- if value.prefix != "xmlns" and old_attr.prefix != "xmlns"
2380
- old_namespace = old_attr.namespace
2381
- new_namespace = value.namespace
2382
- if old_namespace == new_namespace
2383
- raise ParseException.new(
2384
- "Namespace conflict in adding attribute \"#{value.name}\": "+
2385
- "Prefix \"#{old_attr.prefix}\" = \"#{old_namespace}\" and "+
2386
- "prefix \"#{value.prefix}\" = \"#{new_namespace}\"")
2387
- end
2388
- end
2389
2387
  store value.name, {old_attr.prefix => old_attr,
2390
2388
  value.prefix => value}
2391
2389
  else
data/lib/rexml/entity.rb CHANGED
@@ -12,6 +12,7 @@ module REXML
12
12
  EXTERNALID = "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
13
13
  NDATADECL = "\\s+NDATA\\s+#{NAME}"
14
14
  PEREFERENCE = "%#{NAME};"
15
+ PEREFERENCE_RE = /#{PEREFERENCE}/um
15
16
  ENTITYVALUE = %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
16
17
  PEDEF = "(?:#{ENTITYVALUE}|#{EXTERNALID})"
17
18
  ENTITYDEF = "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
@@ -19,7 +20,7 @@ module REXML
19
20
  GEDECL = "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
20
21
  ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
21
22
 
22
- attr_reader :name, :external, :ref, :ndata, :pubid
23
+ attr_reader :name, :external, :ref, :ndata, :pubid, :value
23
24
 
24
25
  # Create a new entity. Simple entities can be constructed by passing a
25
26
  # name, value to the constructor; this creates a generic, plain entity
@@ -68,14 +69,14 @@ module REXML
68
69
  end
69
70
 
70
71
  # Evaluates to the unnormalized value of this entity; that is, replacing
71
- # all entities -- both %ent; and &ent; entities. This differs from
72
- # +value()+ in that +value+ only replaces %ent; entities.
72
+ # &ent; entities.
73
73
  def unnormalized
74
- document.record_entity_expansion unless document.nil?
75
- v = value()
76
- return nil if v.nil?
77
- @unnormalized = Text::unnormalize(v, parent)
78
- @unnormalized
74
+ document&.record_entity_expansion
75
+
76
+ return nil if @value.nil?
77
+
78
+ @unnormalized = Text::unnormalize(@value, parent,
79
+ entity_expansion_text_limit: document&.entity_expansion_text_limit)
79
80
  end
80
81
 
81
82
  #once :unnormalized
@@ -121,46 +122,6 @@ module REXML
121
122
  write rv
122
123
  rv
123
124
  end
124
-
125
- PEREFERENCE_RE = /#{PEREFERENCE}/um
126
- # Returns the value of this entity. At the moment, only internal entities
127
- # are processed. If the value contains internal references (IE,
128
- # %blah;), those are replaced with their values. IE, if the doctype
129
- # contains:
130
- # <!ENTITY % foo "bar">
131
- # <!ENTITY yada "nanoo %foo; nanoo>
132
- # then:
133
- # doctype.entity('yada').value #-> "nanoo bar nanoo"
134
- def value
135
- @resolved_value ||= resolve_value
136
- end
137
-
138
- def parent=(other)
139
- @resolved_value = nil
140
- super
141
- end
142
-
143
- private
144
- def resolve_value
145
- return nil if @value.nil?
146
- return @value unless @value.match?(PEREFERENCE_RE)
147
-
148
- matches = @value.scan(PEREFERENCE_RE)
149
- rv = @value.clone
150
- if @parent
151
- sum = 0
152
- matches.each do |entity_reference|
153
- entity_value = @parent.entity( entity_reference[0] )
154
- if sum + entity_value.bytesize > Security.entity_expansion_text_limit
155
- raise "entity expansion has grown too large"
156
- else
157
- sum += entity_value.bytesize
158
- end
159
- rv.gsub!( /%#{entity_reference.join};/um, entity_value )
160
- end
161
- end
162
- rv
163
- end
164
125
  end
165
126
 
166
127
  # This is a set of entity constants -- the ones defined in the XML
@@ -111,7 +111,7 @@ module REXML
111
111
  # itself, then we don't need a carriage return... which makes this
112
112
  # logic more complex.
113
113
  node.children.each { |child|
114
- next if child == node.children[-1] and child.instance_of?(Text)
114
+ next if child.instance_of?(Text)
115
115
  unless child == node.children[0] or child.instance_of?(Text) or
116
116
  (child == node.children[1] and !node.children[0].writethis)
117
117
  output << "\n"
@@ -1,12 +1,29 @@
1
1
  # frozen_string_literal: true
2
2
  require_relative '../parseexception'
3
3
  require_relative '../undefinednamespaceexception'
4
+ require_relative '../security'
4
5
  require_relative '../source'
5
6
  require 'set'
6
7
  require "strscan"
7
8
 
8
9
  module REXML
9
10
  module Parsers
11
+ unless [].respond_to?(:tally)
12
+ module EnumerableTally
13
+ refine Enumerable do
14
+ def tally
15
+ counts = {}
16
+ each do |item|
17
+ counts[item] ||= 0
18
+ counts[item] += 1
19
+ end
20
+ counts
21
+ end
22
+ end
23
+ end
24
+ using EnumerableTally
25
+ end
26
+
10
27
  if StringScanner::Version < "3.0.8"
11
28
  module StringScannerCaptures
12
29
  refine StringScanner do
@@ -124,11 +141,11 @@ module REXML
124
141
  }
125
142
 
126
143
  module Private
127
- INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
144
+ PEREFERENCE_PATTERN = /#{PEREFERENCE}/um
128
145
  TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
129
146
  CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
130
147
  ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
131
- NAME_PATTERN = /\s*#{NAME}/um
148
+ NAME_PATTERN = /#{NAME}/um
132
149
  GEDECL_PATTERN = "\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
133
150
  PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
134
151
  ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
@@ -139,6 +156,7 @@ module REXML
139
156
  default_entities.each do |term|
140
157
  DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
141
158
  end
159
+ XML_PREFIXED_NAMESPACE = "http://www.w3.org/XML/1998/namespace"
142
160
  end
143
161
  private_constant :Private
144
162
 
@@ -146,6 +164,9 @@ module REXML
146
164
  self.stream = source
147
165
  @listeners = []
148
166
  @prefixes = Set.new
167
+ @entity_expansion_count = 0
168
+ @entity_expansion_limit = Security.entity_expansion_limit
169
+ @entity_expansion_text_limit = Security.entity_expansion_text_limit
149
170
  end
150
171
 
151
172
  def add_listener( listener )
@@ -153,15 +174,20 @@ module REXML
153
174
  end
154
175
 
155
176
  attr_reader :source
177
+ attr_reader :entity_expansion_count
178
+ attr_writer :entity_expansion_limit
179
+ attr_writer :entity_expansion_text_limit
156
180
 
157
181
  def stream=( source )
158
182
  @source = SourceFactory.create_from( source )
159
183
  @closed = nil
184
+ @have_root = false
160
185
  @document_status = nil
161
186
  @tags = []
162
187
  @stack = []
163
188
  @entities = []
164
- @nsstack = []
189
+ @namespaces = {"xml" => Private::XML_PREFIXED_NAMESPACE}
190
+ @namespaces_restore_stack = []
165
191
  end
166
192
 
167
193
  def position
@@ -229,6 +255,10 @@ module REXML
229
255
  if @document_status == :in_doctype
230
256
  raise ParseException.new("Malformed DOCTYPE: unclosed", @source)
231
257
  end
258
+ unless @tags.empty?
259
+ path = "/" + @tags.join("/")
260
+ raise ParseException.new("Missing end tag for '#{path}'", @source)
261
+ end
232
262
  return [ :end_document ]
233
263
  end
234
264
  return @stack.shift if @stack.size > 0
@@ -239,7 +269,7 @@ module REXML
239
269
  if @document_status == nil
240
270
  start_position = @source.position
241
271
  if @source.match("<?", true)
242
- return process_instruction(start_position)
272
+ return process_instruction
243
273
  elsif @source.match("<!", true)
244
274
  if @source.match("--", true)
245
275
  md = @source.match(/(.*?)-->/um, true)
@@ -261,7 +291,6 @@ module REXML
261
291
  @source.position = start_position
262
292
  raise REXML::ParseException.new(message, @source)
263
293
  end
264
- @nsstack.unshift(Set.new)
265
294
  name = parse_name(base_error_message)
266
295
  if @source.match(/\s*\[/um, true)
267
296
  id = [nil, nil, nil]
@@ -309,7 +338,11 @@ module REXML
309
338
  raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
310
339
  return [ :elementdecl, "<!ELEMENT" + md[1] ]
311
340
  elsif @source.match("ENTITY", true)
312
- match = [:entitydecl, *@source.match(Private::ENTITYDECL_PATTERN, true).captures.compact]
341
+ match_data = @source.match(Private::ENTITYDECL_PATTERN, true)
342
+ unless match_data
343
+ raise REXML::ParseException.new("Malformed entity declaration", @source)
344
+ end
345
+ match = [:entitydecl, *match_data.captures.compact]
313
346
  ref = false
314
347
  if match[1] == '%'
315
348
  ref = true
@@ -327,6 +360,8 @@ module REXML
327
360
  match[4] = match[4][1..-2] # HREF
328
361
  match.delete_at(5) if match.size > 5 # Chop out NDATA decl
329
362
  # match is [ :entity, name, PUBLIC, pubid, href(, ndata)? ]
363
+ elsif Private::PEREFERENCE_PATTERN.match?(match[2])
364
+ raise REXML::ParseException.new("Parameter entity references forbidden in internal subset: #{match[2]}", @source)
330
365
  else
331
366
  match[2] = match[2][1..-2]
332
367
  match.pop if match.size == 4
@@ -341,7 +376,7 @@ module REXML
341
376
  contents = md[0]
342
377
 
343
378
  pairs = {}
344
- values = md[0].scan( ATTDEF_RE )
379
+ values = md[0].strip.scan( ATTDEF_RE )
345
380
  values.each do |attdef|
346
381
  unless attdef[3] == "#IMPLIED"
347
382
  attdef.compact!
@@ -349,7 +384,7 @@ module REXML
349
384
  val = attdef[4] if val == "#FIXED "
350
385
  pairs[attdef[0]] = val
351
386
  if attdef[0] =~ /^xmlns:(.*)/
352
- @nsstack[0] << $1
387
+ @namespaces[$1] = val
353
388
  end
354
389
  end
355
390
  end
@@ -402,7 +437,7 @@ module REXML
402
437
  # here explicitly.
403
438
  @source.ensure_buffer
404
439
  if @source.match("/", true)
405
- @nsstack.shift
440
+ @namespaces_restore_stack.pop
406
441
  last_tag = @tags.pop
407
442
  md = @source.match(Private::CLOSE_PATTERN, true)
408
443
  if md and !last_tag
@@ -435,7 +470,7 @@ module REXML
435
470
  raise REXML::ParseException.new( "Declarations can only occur "+
436
471
  "in the doctype declaration.", @source)
437
472
  elsif @source.match("?", true)
438
- return process_instruction(start_position)
473
+ return process_instruction
439
474
  else
440
475
  # Get the next tag
441
476
  md = @source.match(Private::TAG_PATTERN, true)
@@ -447,21 +482,25 @@ module REXML
447
482
  @document_status = :in_element
448
483
  @prefixes.clear
449
484
  @prefixes << md[2] if md[2]
450
- @nsstack.unshift(curr_ns=Set.new)
451
- attributes, closed = parse_attributes(@prefixes, curr_ns)
485
+ push_namespaces_restore
486
+ attributes, closed = parse_attributes(@prefixes)
452
487
  # Verify that all of the prefixes have been defined
453
488
  for prefix in @prefixes
454
- unless @nsstack.find{|k| k.member?(prefix)}
489
+ unless @namespaces.key?(prefix)
455
490
  raise UndefinedNamespaceException.new(prefix,@source,self)
456
491
  end
457
492
  end
458
493
 
459
494
  if closed
460
495
  @closed = tag
461
- @nsstack.shift
496
+ pop_namespaces_restore
462
497
  else
498
+ if @tags.empty? and @have_root
499
+ raise ParseException.new("Malformed XML: Extra tag at the end of the document (got '<#{tag}')", @source)
500
+ end
463
501
  @tags.push( tag )
464
502
  end
503
+ @have_root = true
465
504
  return [ :start_element, tag, attributes ]
466
505
  end
467
506
  else
@@ -469,6 +508,16 @@ module REXML
469
508
  if text.chomp!("<")
470
509
  @source.position -= "<".bytesize
471
510
  end
511
+ if @tags.empty?
512
+ unless /\A\s*\z/.match?(text)
513
+ if @have_root
514
+ raise ParseException.new("Malformed XML: Extra content at the end of the document (got '#{text}')", @source)
515
+ else
516
+ raise ParseException.new("Malformed XML: Content at the start of the document (got '#{text}')", @source)
517
+ end
518
+ end
519
+ return pull_event if @have_root
520
+ end
472
521
  return [ :text, text ]
473
522
  end
474
523
  rescue REXML::UndefinedNamespaceException
@@ -484,13 +533,13 @@ module REXML
484
533
  private :pull_event
485
534
 
486
535
  def entity( reference, entities )
487
- value = nil
488
- value = entities[ reference ] if entities
489
- if not value
490
- value = DEFAULT_ENTITIES[ reference ]
491
- value = value[2] if value
492
- end
493
- unnormalize( value, entities ) if value
536
+ return unless entities
537
+
538
+ value = entities[ reference ]
539
+ return if value.nil?
540
+
541
+ record_entity_expansion
542
+ unnormalize( value, entities )
494
543
  end
495
544
 
496
545
  # Escapes all possible entities
@@ -511,7 +560,11 @@ module REXML
511
560
 
512
561
  # Unescapes all possible entities
513
562
  def unnormalize( string, entities=nil, filter=nil )
514
- rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
563
+ if string.include?("\r")
564
+ rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
565
+ else
566
+ rv = string.dup
567
+ end
515
568
  matches = rv.scan( REFERENCE_RE )
516
569
  return rv if matches.size == 0
517
570
  rv.gsub!( Private::CHARACTER_REFERENCES ) {
@@ -520,17 +573,29 @@ module REXML
520
573
  [Integer(m)].pack('U*')
521
574
  }
522
575
  matches.collect!{|x|x[0]}.compact!
576
+ if filter
577
+ matches.reject! do |entity_reference|
578
+ filter.include?(entity_reference)
579
+ end
580
+ end
523
581
  if matches.size > 0
524
- matches.each do |entity_reference|
525
- unless filter and filter.include?(entity_reference)
526
- entity_value = entity( entity_reference, entities )
527
- if entity_value
528
- re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
529
- rv.gsub!( re, entity_value )
530
- else
531
- er = DEFAULT_ENTITIES[entity_reference]
532
- rv.gsub!( er[0], er[2] ) if er
582
+ matches.tally.each do |entity_reference, n|
583
+ entity_expansion_count_before = @entity_expansion_count
584
+ entity_value = entity( entity_reference, entities )
585
+ if entity_value
586
+ if n > 1
587
+ entity_expansion_count_delta =
588
+ @entity_expansion_count - entity_expansion_count_before
589
+ record_entity_expansion(entity_expansion_count_delta * (n - 1))
533
590
  end
591
+ re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
592
+ rv.gsub!( re, entity_value )
593
+ if rv.bytesize > @entity_expansion_text_limit
594
+ raise "entity expansion has grown too large"
595
+ end
596
+ else
597
+ er = DEFAULT_ENTITIES[entity_reference]
598
+ rv.gsub!( er[0], er[2] ) if er
534
599
  end
535
600
  end
536
601
  rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
@@ -539,6 +604,39 @@ module REXML
539
604
  end
540
605
 
541
606
  private
607
+ def add_namespace(prefix, uri)
608
+ @namespaces_restore_stack.last[prefix] = @namespaces[prefix]
609
+ if uri.nil?
610
+ @namespaces.delete(prefix)
611
+ else
612
+ @namespaces[prefix] = uri
613
+ end
614
+ end
615
+
616
+ def push_namespaces_restore
617
+ namespaces_restore = {}
618
+ @namespaces_restore_stack.push(namespaces_restore)
619
+ namespaces_restore
620
+ end
621
+
622
+ def pop_namespaces_restore
623
+ namespaces_restore = @namespaces_restore_stack.pop
624
+ namespaces_restore.each do |prefix, uri|
625
+ if uri.nil?
626
+ @namespaces.delete(prefix)
627
+ else
628
+ @namespaces[prefix] = uri
629
+ end
630
+ end
631
+ end
632
+
633
+ def record_entity_expansion(delta=1)
634
+ @entity_expansion_count += delta
635
+ if @entity_expansion_count > @entity_expansion_limit
636
+ raise "number of entity expansions exceeded, processing aborted."
637
+ end
638
+ end
639
+
542
640
  def need_source_encoding_update?(xml_declaration_encoding)
543
641
  return false if xml_declaration_encoding.nil?
544
642
  return false if /\AUTF-16\z/i =~ xml_declaration_encoding
@@ -548,14 +646,14 @@ module REXML
548
646
  def parse_name(base_error_message)
549
647
  md = @source.match(Private::NAME_PATTERN, true)
550
648
  unless md
551
- if @source.match(/\s*\S/um)
649
+ if @source.match(/\S/um)
552
650
  message = "#{base_error_message}: invalid name"
553
651
  else
554
652
  message = "#{base_error_message}: name is missing"
555
653
  end
556
654
  raise REXML::ParseException.new(message, @source)
557
655
  end
558
- md[1]
656
+ md[0]
559
657
  end
560
658
 
561
659
  def parse_id(base_error_message,
@@ -624,15 +722,24 @@ module REXML
624
722
  end
625
723
  end
626
724
 
627
- def process_instruction(start_position)
628
- match_data = @source.match(Private::INSTRUCTION_END, true)
629
- unless match_data
630
- message = "Invalid processing instruction node"
631
- @source.position = start_position
632
- raise REXML::ParseException.new(message, @source)
725
+ def process_instruction
726
+ name = parse_name("Malformed XML: Invalid processing instruction node")
727
+ if @source.match(/\s+/um, true)
728
+ match_data = @source.match(/(.*?)\?>/um, true)
729
+ unless match_data
730
+ raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
731
+ end
732
+ content = match_data[1]
733
+ else
734
+ content = nil
735
+ unless @source.match("?>", true)
736
+ raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
737
+ end
633
738
  end
634
- if @document_status.nil? and match_data[1] == "xml"
635
- content = match_data[2]
739
+ if name == "xml"
740
+ if @document_status
741
+ raise ParseException.new("Malformed XML: XML declaration is not at the start", @source)
742
+ end
636
743
  version = VERSION.match(content)
637
744
  version = version[1] unless version.nil?
638
745
  encoding = ENCODING.match(content)
@@ -647,11 +754,12 @@ module REXML
647
754
  standalone = standalone[1] unless standalone.nil?
648
755
  return [ :xmldecl, version, encoding, standalone ]
649
756
  end
650
- [:processing_instruction, match_data[1], match_data[2]]
757
+ [:processing_instruction, name, content]
651
758
  end
652
759
 
653
- def parse_attributes(prefixes, curr_ns)
760
+ def parse_attributes(prefixes)
654
761
  attributes = {}
762
+ expanded_names = {}
655
763
  closed = false
656
764
  while true
657
765
  if @source.match(">", true)
@@ -683,7 +791,7 @@ module REXML
683
791
  @source.match(/\s*/um, true)
684
792
  if prefix == "xmlns"
685
793
  if local_part == "xml"
686
- if value != "http://www.w3.org/XML/1998/namespace"
794
+ if value != Private::XML_PREFIXED_NAMESPACE
687
795
  msg = "The 'xml' prefix must not be bound to any other namespace "+
688
796
  "(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
689
797
  raise REXML::ParseException.new( msg, @source, self )
@@ -693,7 +801,7 @@ module REXML
693
801
  "(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
694
802
  raise REXML::ParseException.new( msg, @source, self)
695
803
  end
696
- curr_ns << local_part
804
+ add_namespace(local_part, value)
697
805
  elsif prefix
698
806
  prefixes << prefix unless prefix == "xml"
699
807
  end
@@ -703,6 +811,20 @@ module REXML
703
811
  raise REXML::ParseException.new(msg, @source, self)
704
812
  end
705
813
 
814
+ unless prefix == "xmlns"
815
+ uri = @namespaces[prefix]
816
+ expanded_name = [uri, local_part]
817
+ existing_prefix = expanded_names[expanded_name]
818
+ if existing_prefix
819
+ message = "Namespace conflict in adding attribute " +
820
+ "\"#{local_part}\": " +
821
+ "Prefix \"#{existing_prefix}\" = \"#{uri}\" and " +
822
+ "prefix \"#{prefix}\" = \"#{uri}\""
823
+ raise REXML::ParseException.new(message, @source, self)
824
+ end
825
+ expanded_names[expanded_name] = prefix
826
+ end
827
+
706
828
  attributes[name] = value
707
829
  else
708
830
  message = "Invalid attribute name: <#{@source.buffer.split(%r{[/>\s]}).first}>"
@@ -47,6 +47,18 @@ module REXML
47
47
  @listeners << listener
48
48
  end
49
49
 
50
+ def entity_expansion_count
51
+ @parser.entity_expansion_count
52
+ end
53
+
54
+ def entity_expansion_limit=( limit )
55
+ @parser.entity_expansion_limit = limit
56
+ end
57
+
58
+ def entity_expansion_text_limit=( limit )
59
+ @parser.entity_expansion_text_limit = limit
60
+ end
61
+
50
62
  def each
51
63
  while has_next?
52
64
  yield self.pull
@@ -22,6 +22,18 @@ module REXML
22
22
  @parser.source
23
23
  end
24
24
 
25
+ def entity_expansion_count
26
+ @parser.entity_expansion_count
27
+ end
28
+
29
+ def entity_expansion_limit=( limit )
30
+ @parser.entity_expansion_limit = limit
31
+ end
32
+
33
+ def entity_expansion_text_limit=( limit )
34
+ @parser.entity_expansion_text_limit = limit
35
+ end
36
+
25
37
  def add_listener( listener )
26
38
  @parser.add_listener( listener )
27
39
  end
@@ -157,25 +169,8 @@ module REXML
157
169
  end
158
170
  end
159
171
  when :text
160
- #normalized = @parser.normalize( event[1] )
161
- #handle( :characters, normalized )
162
- copy = event[1].clone
163
-
164
- esub = proc { |match|
165
- if @entities.has_key?($1)
166
- @entities[$1].gsub(Text::REFERENCE, &esub)
167
- else
168
- match
169
- end
170
- }
171
-
172
- copy.gsub!( Text::REFERENCE, &esub )
173
- copy.gsub!( Text::NUMERICENTITY ) {|m|
174
- m=$1
175
- m = "0#{m}" if m[0] == ?x
176
- [Integer(m)].pack('U*')
177
- }
178
- handle( :characters, copy )
172
+ unnormalized = @parser.unnormalize( event[1], @entities )
173
+ handle( :characters, unnormalized )
179
174
  when :entitydecl
180
175
  handle_entitydecl( event )
181
176
  when :processing_instruction, :comment, :attlistdecl,
@@ -264,6 +259,8 @@ module REXML
264
259
  end
265
260
 
266
261
  def get_namespace( prefix )
262
+ return nil if @namespace_stack.empty?
263
+
267
264
  uris = (@namespace_stack.find_all { |ns| not ns[prefix].nil? }) ||
268
265
  (@namespace_stack.find { |ns| not ns[nil].nil? })
269
266
  uris[-1][prefix] unless uris.nil? or 0 == uris.size
@@ -7,37 +7,42 @@ module REXML
7
7
  def initialize source, listener
8
8
  @listener = listener
9
9
  @parser = BaseParser.new( source )
10
- @tag_stack = []
10
+ @entities = {}
11
11
  end
12
12
 
13
13
  def add_listener( listener )
14
14
  @parser.add_listener( listener )
15
15
  end
16
16
 
17
+ def entity_expansion_count
18
+ @parser.entity_expansion_count
19
+ end
20
+
21
+ def entity_expansion_limit=( limit )
22
+ @parser.entity_expansion_limit = limit
23
+ end
24
+
25
+ def entity_expansion_text_limit=( limit )
26
+ @parser.entity_expansion_text_limit = limit
27
+ end
28
+
17
29
  def parse
18
30
  # entity string
19
31
  while true
20
32
  event = @parser.pull
21
33
  case event[0]
22
34
  when :end_document
23
- unless @tag_stack.empty?
24
- tag_path = "/" + @tag_stack.join("/")
25
- raise ParseException.new("Missing end tag for '#{tag_path}'",
26
- @parser.source)
27
- end
28
35
  return
29
36
  when :start_element
30
- @tag_stack << event[1]
31
37
  attrs = event[2].each do |n, v|
32
38
  event[2][n] = @parser.unnormalize( v )
33
39
  end
34
40
  @listener.tag_start( event[1], attrs )
35
41
  when :end_element
36
42
  @listener.tag_end( event[1] )
37
- @tag_stack.pop
38
43
  when :text
39
- normalized = @parser.unnormalize( event[1] )
40
- @listener.text( normalized )
44
+ unnormalized = @parser.unnormalize( event[1], @entities )
45
+ @listener.text( unnormalized )
41
46
  when :processing_instruction
42
47
  @listener.instruction( *event[1,2] )
43
48
  when :start_doctype
@@ -48,6 +53,7 @@ module REXML
48
53
  when :comment, :attlistdecl, :cdata, :xmldecl, :elementdecl
49
54
  @listener.send( event[0].to_s, *event[1..-1] )
50
55
  when :entitydecl, :notationdecl
56
+ @entities[ event[1] ] = event[2] if event.size == 3
51
57
  @listener.send( event[0].to_s, event[1..-1] )
52
58
  when :externalentity
53
59
  entity_reference = event[1]
@@ -15,7 +15,6 @@ module REXML
15
15
  end
16
16
 
17
17
  def parse
18
- tag_stack = []
19
18
  entities = nil
20
19
  begin
21
20
  while true
@@ -23,19 +22,13 @@ module REXML
23
22
  #STDERR.puts "TREEPARSER GOT #{event.inspect}"
24
23
  case event[0]
25
24
  when :end_document
26
- unless tag_stack.empty?
27
- raise ParseException.new("No close tag for #{@build_context.xpath}",
28
- @parser.source, @parser)
29
- end
30
25
  return
31
26
  when :start_element
32
- tag_stack.push(event[1])
33
27
  el = @build_context = @build_context.add_element( event[1] )
34
28
  event[2].each do |key, value|
35
29
  el.attributes[key]=Attribute.new(key,value,self)
36
30
  end
37
31
  when :end_element
38
- tag_stack.pop
39
32
  @build_context = @build_context.parent
40
33
  when :text
41
34
  if @build_context[-1].instance_of? Text
data/lib/rexml/rexml.rb CHANGED
@@ -31,7 +31,7 @@
31
31
  module REXML
32
32
  COPYRIGHT = "Copyright © 2001-2008 Sean Russell <ser@germane-software.com>"
33
33
  DATE = "2008/019"
34
- VERSION = "3.3.1"
34
+ VERSION = "3.3.8"
35
35
  REVISION = ""
36
36
 
37
37
  Copyright = COPYRIGHT
data/lib/rexml/source.rb CHANGED
@@ -204,10 +204,20 @@ module REXML
204
204
  end
205
205
  end
206
206
 
207
- def read(term = nil)
207
+ def read(term = nil, min_bytes = 1)
208
208
  term = encode(term) if term
209
209
  begin
210
- @scanner << readline(term)
210
+ str = readline(term)
211
+ @scanner << str
212
+ read_bytes = str.bytesize
213
+ begin
214
+ while read_bytes < min_bytes
215
+ str = readline(term)
216
+ @scanner << str
217
+ read_bytes += str.bytesize
218
+ end
219
+ rescue IOError
220
+ end
211
221
  true
212
222
  rescue Exception, NameError
213
223
  @source = nil
@@ -237,10 +247,9 @@ module REXML
237
247
  read if @scanner.eos? && @source
238
248
  end
239
249
 
240
- # Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
241
- # - ">"
242
- # - "XXX>" (X is any string excluding '>')
243
250
  def match( pattern, cons=false )
251
+ # To avoid performance issue, we need to increase bytes to read per scan
252
+ min_bytes = 1
244
253
  while true
245
254
  if cons
246
255
  md = @scanner.scan(pattern)
@@ -250,7 +259,8 @@ module REXML
250
259
  break if md
251
260
  return nil if pattern.is_a?(String)
252
261
  return nil if @source.nil?
253
- return nil unless read
262
+ return nil unless read(nil, min_bytes)
263
+ min_bytes *= 2
254
264
  end
255
265
 
256
266
  md.nil? ? nil : @scanner
data/lib/rexml/text.rb CHANGED
@@ -151,25 +151,45 @@ module REXML
151
151
  end
152
152
  end
153
153
 
154
- # context sensitive
155
- string.scan(pattern) do
156
- if $1[-1] != ?;
157
- raise "Illegal character #{$1.inspect} in raw string #{string.inspect}"
158
- elsif $1[0] == ?&
159
- if $5 and $5[0] == ?#
160
- case ($5[1] == ?x ? $5[2..-1].to_i(16) : $5[1..-1].to_i)
161
- when *VALID_CHAR
154
+ pos = 0
155
+ while (index = string.index(/<|&/, pos))
156
+ if string[index] == "<"
157
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
158
+ end
159
+
160
+ unless (end_index = string.index(/[^\s];/, index + 1))
161
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
162
+ end
163
+
164
+ value = string[(index + 1)..end_index]
165
+ if /\s/.match?(value)
166
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
167
+ end
168
+
169
+ if value[0] == "#"
170
+ character_reference = value[1..-1]
171
+
172
+ unless (/\A(\d+|x[0-9a-fA-F]+)\z/.match?(character_reference))
173
+ if character_reference[0] == "x" || character_reference[-1] == "x"
174
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
162
175
  else
163
- raise "Illegal character #{$1.inspect} in raw string #{string.inspect}"
176
+ raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
164
177
  end
165
- # FIXME: below can't work but this needs API change.
166
- # elsif @parent and $3 and !SUBSTITUTES.include?($1)
167
- # if !doctype or !doctype.entities.has_key?($3)
168
- # raise "Undeclared entity '#{$1}' in raw string \"#{string}\""
169
- # end
170
178
  end
179
+
180
+ case (character_reference[0] == "x" ? character_reference[1..-1].to_i(16) : character_reference[0..-1].to_i)
181
+ when *VALID_CHAR
182
+ else
183
+ raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
184
+ end
185
+ elsif !(/\A#{Entity::NAME}\z/um.match?(value))
186
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
171
187
  end
188
+
189
+ pos = end_index + 1
172
190
  end
191
+
192
+ string
173
193
  end
174
194
 
175
195
  def node_type
@@ -248,7 +268,8 @@ module REXML
248
268
  # u = Text.new( "sean russell", false, nil, true )
249
269
  # u.value #-> "sean russell"
250
270
  def value
251
- @unnormalized ||= Text::unnormalize( @string, doctype )
271
+ @unnormalized ||= Text::unnormalize(@string, doctype,
272
+ entity_expansion_text_limit: document&.entity_expansion_text_limit)
252
273
  end
253
274
 
254
275
  # Sets the contents of this text node. This expects the text to be
@@ -391,11 +412,12 @@ module REXML
391
412
  end
392
413
 
393
414
  # Unescapes all possible entities
394
- def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil )
415
+ def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil, entity_expansion_text_limit: nil )
416
+ entity_expansion_text_limit ||= Security.entity_expansion_text_limit
395
417
  sum = 0
396
418
  string.gsub( /\r\n?/, "\n" ).gsub( REFERENCE ) {
397
419
  s = Text.expand($&, doctype, filter)
398
- if sum + s.bytesize > Security.entity_expansion_text_limit
420
+ if sum + s.bytesize > entity_expansion_text_limit
399
421
  raise "entity expansion has grown too large"
400
422
  else
401
423
  sum += s.bytesize
metadata CHANGED
@@ -1,28 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rexml
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.3.1
4
+ version: 3.3.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kouhei Sutou
8
8
  bindir: bin
9
9
  cert_chain: []
10
- date: 2024-06-25 00:00:00.000000000 Z
11
- dependencies:
12
- - !ruby/object:Gem::Dependency
13
- name: strscan
14
- requirement: !ruby/object:Gem::Requirement
15
- requirements:
16
- - - ">="
17
- - !ruby/object:Gem::Version
18
- version: '0'
19
- type: :runtime
20
- prerelease: false
21
- version_requirements: !ruby/object:Gem::Requirement
22
- requirements:
23
- - - ">="
24
- - !ruby/object:Gem::Version
25
- version: '0'
10
+ date: 2024-09-29 00:00:00.000000000 Z
11
+ dependencies: []
26
12
  description: An XML toolkit for Ruby
27
13
  email:
28
14
  - kou@cozmixng.org
@@ -116,7 +102,7 @@ homepage: https://github.com/ruby/rexml
116
102
  licenses:
117
103
  - BSD-2-Clause
118
104
  metadata:
119
- changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.1
105
+ changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.8
120
106
  rdoc_options:
121
107
  - "--main"
122
108
  - README.md