rexml 3.3.2 → 3.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of rexml might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 70ccd1465a05dba3d53dcfc4a98e76dec865a4f6ac833b954aff4234bce6c255
4
- data.tar.gz: 53f43fab8f531e0ba7461ce091e5eae6bec27b12e9139450c7b3e748b4eeacdc
3
+ metadata.gz: 84b42219a4278ab15e7ee7627951d0b94dddc707cbf9563799b3266d02ed32db
4
+ data.tar.gz: 4895e6f04d100a2affc8d5c6af4c6dfec5ec4d0d863f8d22de1c66da1d253c61
5
5
  SHA512:
6
- metadata.gz: b46818d79ae57075c4e0bd620802e82c6958dddc7da1b182504c3fdc16685c887ac0ddd6a4838a080483abba330839e9ef4b2db22cc81b9eae3eac71ac14c965
7
- data.tar.gz: 1e5205905eb435c02038dd0539de22472f5364ffc47635f13a1752cb79a423dcca558fb47394ac5d624b358e779b07cbcafedfd06b99742026856f9988109976
6
+ metadata.gz: 7729c31da310e2fb7c96cc3a5bd5b981fefdcdae6fe545bf2d113d91af5862fbb51789e9289b91e4247963169900b0cdccc373ffeea6ca3f935b2e32bab1e2e4
7
+ data.tar.gz: 542f689b7cd27b5c71aeb6845e5af2ac28186e31a98af8c45e984ce6ca563192b2a74e50b6acd95f1fde49ed6289bf9024bfd6612608455038a22e66c6b3a75b
data/NEWS.md CHANGED
@@ -1,5 +1,152 @@
1
1
  # News
2
2
 
3
+ ## 3.3.8 - 2024-09-29 {#version-3-3-8}
4
+
5
+ ### Improvements
6
+
7
+ * SAX2: Improve parse performance.
8
+ * GH-207
9
+ * Patch by NAITOH Jun.
10
+
11
+ ### Fixes
12
+
13
+ * Fixed a bug that unexpected attribute namespace conflict error for
14
+ the predefined "xml" namespace is reported.
15
+ * GH-208
16
+ * Patch by KITAITI Makoto
17
+
18
+ ### Thanks
19
+
20
+ * NAITOH Jun
21
+
22
+ * KITAITI Makoto
23
+
24
+ ## 3.3.7 - 2024-09-04 {#version-3-3-7}
25
+
26
+ ### Improvements
27
+
28
+ * Added local entity expansion limit methods
29
+ * GH-192
30
+ * GH-202
31
+ * Reported by takuya kodama.
32
+ * Patch by NAITOH Jun.
33
+
34
+ * Removed explicit strscan dependency
35
+ * GH-204
36
+ * Patch by Bo Anderson.
37
+
38
+ ### Thanks
39
+
40
+ * takuya kodama
41
+
42
+ * NAITOH Jun
43
+
44
+ * Bo Anderson
45
+
46
+ ## 3.3.6 - 2024-08-22 {#version-3-3-6}
47
+
48
+ ### Improvements
49
+
50
+ * Removed duplicated entity expansions for performance.
51
+ * GH-194
52
+ * Patch by Viktor Ivarsson.
53
+
54
+ * Improved namespace conflicted attribute check performance. It was
55
+ too slow for deep elements.
56
+ * Reported by l33thaxor.
57
+
58
+ ### Fixes
59
+
60
+ * Fixed a bug that default entity expansions are counted for
61
+ security check. Default entity expansions should not be counted
62
+ because they don't have a security risk.
63
+ * GH-198
64
+ * GH-199
65
+ * Patch Viktor Ivarsson
66
+
67
+ * Fixed a parser bug that parameter entity references in internal
68
+ subsets are expanded. It's not allowed in the XML specification.
69
+ * GH-191
70
+ * Patch by NAITOH Jun.
71
+
72
+ * Fixed a stream parser bug that user-defined entity references in
73
+ text aren't expanded.
74
+ * GH-200
75
+ * Patch by NAITOH Jun.
76
+
77
+ ### Thanks
78
+
79
+ * Viktor Ivarsson
80
+
81
+ * NAITOH Jun
82
+
83
+ * l33thaxor
84
+
85
+ ## 3.3.5 - 2024-08-12 {#version-3-3-5}
86
+
87
+ ### Fixes
88
+
89
+ * Fixed a bug that `REXML::Security.entity_expansion_text_limit`
90
+ check has wrong text size calculation in SAX and pull parsers.
91
+ * GH-193
92
+ * GH-195
93
+ * Reported by Viktor Ivarsson.
94
+ * Patch by NAITOH Jun.
95
+
96
+ ### Thanks
97
+
98
+ * Viktor Ivarsson
99
+
100
+ * NAITOH Jun
101
+
102
+ ## 3.3.4 - 2024-08-01 {#version-3-3-4}
103
+
104
+ ### Fixes
105
+
106
+ * Fixed a bug that `REXML::Security` isn't defined when
107
+ `REXML::Parsers::StreamParser` is used and
108
+ `rexml/parsers/streamparser` is only required.
109
+ * GH-189
110
+ * Patch by takuya kodama.
111
+
112
+ ### Thanks
113
+
114
+ * takuya kodama
115
+
116
+ ## 3.3.3 - 2024-08-01 {#version-3-3-3}
117
+
118
+ ### Improvements
119
+
120
+ * Added support for detecting invalid XML that has unsupported
121
+ content before root element
122
+ * GH-184
123
+ * Patch by NAITOH Jun.
124
+
125
+ * Added support for `REXML::Security.entity_expansion_limit=` and
126
+ `REXML::Security.entity_expansion_text_limit=` in SAX2 and pull
127
+ parsers
128
+ * GH-187
129
+ * Patch by NAITOH Jun.
130
+
131
+ * Added more tests for invalid XMLs.
132
+ * GH-183
133
+ * Patch by Watson.
134
+
135
+ * Added more performance tests.
136
+ * Patch by Watson.
137
+
138
+ * Improved parse performance.
139
+ * GH-186
140
+ * Patch by tomoya ishida.
141
+
142
+ ### Thanks
143
+
144
+ * NAITOH Jun
145
+
146
+ * Watson
147
+
148
+ * tomoya ishida
149
+
3
150
  ## 3.3.2 - 2024-07-16 {#version-3-3-2}
4
151
 
5
152
  ### Improvements
@@ -15,6 +162,9 @@
15
162
  * GH-172
16
163
  * GH-173
17
164
  * GH-174
165
+ * GH-175
166
+ * GH-176
167
+ * GH-177
18
168
  * Patch by Watson.
19
169
 
20
170
  * Added support for raising a parse exception when an XML has extra
@@ -148,8 +148,9 @@ module REXML
148
148
  # have been expanded to their values
149
149
  def value
150
150
  return @unnormalized if @unnormalized
151
- @unnormalized = Text::unnormalize( @normalized, doctype )
152
- @unnormalized
151
+
152
+ @unnormalized = Text::unnormalize(@normalized, doctype,
153
+ entity_expansion_text_limit: @element&.document&.entity_expansion_text_limit)
153
154
  end
154
155
 
155
156
  # The normalized value of this attribute. That is, the attribute with
@@ -91,6 +91,8 @@ module REXML
91
91
  #
92
92
  def initialize( source = nil, context = {} )
93
93
  @entity_expansion_count = 0
94
+ @entity_expansion_limit = Security.entity_expansion_limit
95
+ @entity_expansion_text_limit = Security.entity_expansion_text_limit
94
96
  super()
95
97
  @context = context
96
98
  return if source.nil?
@@ -431,10 +433,12 @@ module REXML
431
433
  end
432
434
 
433
435
  attr_reader :entity_expansion_count
436
+ attr_writer :entity_expansion_limit
437
+ attr_accessor :entity_expansion_text_limit
434
438
 
435
439
  def record_entity_expansion
436
440
  @entity_expansion_count += 1
437
- if @entity_expansion_count > Security.entity_expansion_limit
441
+ if @entity_expansion_count > @entity_expansion_limit
438
442
  raise "number of entity expansions exceeded, processing aborted."
439
443
  end
440
444
  end
data/lib/rexml/element.rb CHANGED
@@ -441,9 +441,14 @@ module REXML
441
441
  # Related: #root_node, #document.
442
442
  #
443
443
  def root
444
- return elements[1] if self.kind_of? Document
445
- return self if parent.kind_of? Document or parent.nil?
446
- return parent.root
444
+ target = self
445
+ while target
446
+ return target.elements[1] if target.kind_of? Document
447
+ parent = target.parent
448
+ return target if parent.kind_of? Document or parent.nil?
449
+ target = parent
450
+ end
451
+ nil
447
452
  end
448
453
 
449
454
  # :call-seq:
@@ -619,8 +624,12 @@ module REXML
619
624
  else
620
625
  prefix = "xmlns:#{prefix}" unless prefix[0,5] == 'xmlns'
621
626
  end
622
- ns = attributes[ prefix ]
623
- ns = parent.namespace(prefix) if ns.nil? and parent
627
+ ns = nil
628
+ target = self
629
+ while ns.nil? and target
630
+ ns = target.attributes[prefix]
631
+ target = target.parent
632
+ end
624
633
  ns = '' if ns.nil? and prefix == 'xmlns'
625
634
  return ns
626
635
  end
@@ -2375,17 +2384,6 @@ module REXML
2375
2384
  elsif old_attr.kind_of? Hash
2376
2385
  old_attr[value.prefix] = value
2377
2386
  elsif old_attr.prefix != value.prefix
2378
- # Check for conflicting namespaces
2379
- if value.prefix != "xmlns" and old_attr.prefix != "xmlns"
2380
- old_namespace = old_attr.namespace
2381
- new_namespace = value.namespace
2382
- if old_namespace == new_namespace
2383
- raise ParseException.new(
2384
- "Namespace conflict in adding attribute \"#{value.name}\": "+
2385
- "Prefix \"#{old_attr.prefix}\" = \"#{old_namespace}\" and "+
2386
- "prefix \"#{value.prefix}\" = \"#{new_namespace}\"")
2387
- end
2388
- end
2389
2387
  store value.name, {old_attr.prefix => old_attr,
2390
2388
  value.prefix => value}
2391
2389
  else
data/lib/rexml/entity.rb CHANGED
@@ -12,6 +12,7 @@ module REXML
12
12
  EXTERNALID = "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
13
13
  NDATADECL = "\\s+NDATA\\s+#{NAME}"
14
14
  PEREFERENCE = "%#{NAME};"
15
+ PEREFERENCE_RE = /#{PEREFERENCE}/um
15
16
  ENTITYVALUE = %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
16
17
  PEDEF = "(?:#{ENTITYVALUE}|#{EXTERNALID})"
17
18
  ENTITYDEF = "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
@@ -19,7 +20,7 @@ module REXML
19
20
  GEDECL = "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
20
21
  ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
21
22
 
22
- attr_reader :name, :external, :ref, :ndata, :pubid
23
+ attr_reader :name, :external, :ref, :ndata, :pubid, :value
23
24
 
24
25
  # Create a new entity. Simple entities can be constructed by passing a
25
26
  # name, value to the constructor; this creates a generic, plain entity
@@ -68,14 +69,14 @@ module REXML
68
69
  end
69
70
 
70
71
  # Evaluates to the unnormalized value of this entity; that is, replacing
71
- # all entities -- both %ent; and &ent; entities. This differs from
72
- # +value()+ in that +value+ only replaces %ent; entities.
72
+ # &ent; entities.
73
73
  def unnormalized
74
- document.record_entity_expansion unless document.nil?
75
- v = value()
76
- return nil if v.nil?
77
- @unnormalized = Text::unnormalize(v, parent)
78
- @unnormalized
74
+ document&.record_entity_expansion
75
+
76
+ return nil if @value.nil?
77
+
78
+ @unnormalized = Text::unnormalize(@value, parent,
79
+ entity_expansion_text_limit: document&.entity_expansion_text_limit)
79
80
  end
80
81
 
81
82
  #once :unnormalized
@@ -121,46 +122,6 @@ module REXML
121
122
  write rv
122
123
  rv
123
124
  end
124
-
125
- PEREFERENCE_RE = /#{PEREFERENCE}/um
126
- # Returns the value of this entity. At the moment, only internal entities
127
- # are processed. If the value contains internal references (IE,
128
- # %blah;), those are replaced with their values. IE, if the doctype
129
- # contains:
130
- # <!ENTITY % foo "bar">
131
- # <!ENTITY yada "nanoo %foo; nanoo>
132
- # then:
133
- # doctype.entity('yada').value #-> "nanoo bar nanoo"
134
- def value
135
- @resolved_value ||= resolve_value
136
- end
137
-
138
- def parent=(other)
139
- @resolved_value = nil
140
- super
141
- end
142
-
143
- private
144
- def resolve_value
145
- return nil if @value.nil?
146
- return @value unless @value.match?(PEREFERENCE_RE)
147
-
148
- matches = @value.scan(PEREFERENCE_RE)
149
- rv = @value.clone
150
- if @parent
151
- sum = 0
152
- matches.each do |entity_reference|
153
- entity_value = @parent.entity( entity_reference[0] )
154
- if sum + entity_value.bytesize > Security.entity_expansion_text_limit
155
- raise "entity expansion has grown too large"
156
- else
157
- sum += entity_value.bytesize
158
- end
159
- rv.gsub!( /%#{entity_reference.join};/um, entity_value )
160
- end
161
- end
162
- rv
163
- end
164
125
  end
165
126
 
166
127
  # This is a set of entity constants -- the ones defined in the XML
@@ -1,12 +1,29 @@
1
1
  # frozen_string_literal: true
2
2
  require_relative '../parseexception'
3
3
  require_relative '../undefinednamespaceexception'
4
+ require_relative '../security'
4
5
  require_relative '../source'
5
6
  require 'set'
6
7
  require "strscan"
7
8
 
8
9
  module REXML
9
10
  module Parsers
11
+ unless [].respond_to?(:tally)
12
+ module EnumerableTally
13
+ refine Enumerable do
14
+ def tally
15
+ counts = {}
16
+ each do |item|
17
+ counts[item] ||= 0
18
+ counts[item] += 1
19
+ end
20
+ counts
21
+ end
22
+ end
23
+ end
24
+ using EnumerableTally
25
+ end
26
+
10
27
  if StringScanner::Version < "3.0.8"
11
28
  module StringScannerCaptures
12
29
  refine StringScanner do
@@ -124,19 +141,11 @@ module REXML
124
141
  }
125
142
 
126
143
  module Private
127
- # Terminal requires two or more letters.
128
- INSTRUCTION_TERM = "?>"
129
- COMMENT_TERM = "-->"
130
- CDATA_TERM = "]]>"
131
- DOCTYPE_TERM = "]>"
132
- # Read to the end of DOCTYPE because there is no proper ENTITY termination
133
- ENTITY_TERM = DOCTYPE_TERM
134
-
135
- INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
144
+ PEREFERENCE_PATTERN = /#{PEREFERENCE}/um
136
145
  TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
137
146
  CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
138
147
  ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
139
- NAME_PATTERN = /\s*#{NAME}/um
148
+ NAME_PATTERN = /#{NAME}/um
140
149
  GEDECL_PATTERN = "\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
141
150
  PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
142
151
  ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
@@ -147,6 +156,7 @@ module REXML
147
156
  default_entities.each do |term|
148
157
  DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
149
158
  end
159
+ XML_PREFIXED_NAMESPACE = "http://www.w3.org/XML/1998/namespace"
150
160
  end
151
161
  private_constant :Private
152
162
 
@@ -154,6 +164,9 @@ module REXML
154
164
  self.stream = source
155
165
  @listeners = []
156
166
  @prefixes = Set.new
167
+ @entity_expansion_count = 0
168
+ @entity_expansion_limit = Security.entity_expansion_limit
169
+ @entity_expansion_text_limit = Security.entity_expansion_text_limit
157
170
  end
158
171
 
159
172
  def add_listener( listener )
@@ -161,6 +174,9 @@ module REXML
161
174
  end
162
175
 
163
176
  attr_reader :source
177
+ attr_reader :entity_expansion_count
178
+ attr_writer :entity_expansion_limit
179
+ attr_writer :entity_expansion_text_limit
164
180
 
165
181
  def stream=( source )
166
182
  @source = SourceFactory.create_from( source )
@@ -170,7 +186,8 @@ module REXML
170
186
  @tags = []
171
187
  @stack = []
172
188
  @entities = []
173
- @nsstack = []
189
+ @namespaces = {"xml" => Private::XML_PREFIXED_NAMESPACE}
190
+ @namespaces_restore_stack = []
174
191
  end
175
192
 
176
193
  def position
@@ -238,6 +255,10 @@ module REXML
238
255
  if @document_status == :in_doctype
239
256
  raise ParseException.new("Malformed DOCTYPE: unclosed", @source)
240
257
  end
258
+ unless @tags.empty?
259
+ path = "/" + @tags.join("/")
260
+ raise ParseException.new("Missing end tag for '#{path}'", @source)
261
+ end
241
262
  return [ :end_document ]
242
263
  end
243
264
  return @stack.shift if @stack.size > 0
@@ -248,10 +269,10 @@ module REXML
248
269
  if @document_status == nil
249
270
  start_position = @source.position
250
271
  if @source.match("<?", true)
251
- return process_instruction(start_position)
272
+ return process_instruction
252
273
  elsif @source.match("<!", true)
253
274
  if @source.match("--", true)
254
- md = @source.match(/(.*?)-->/um, true, term: Private::COMMENT_TERM)
275
+ md = @source.match(/(.*?)-->/um, true)
255
276
  if md.nil?
256
277
  raise REXML::ParseException.new("Unclosed comment", @source)
257
278
  end
@@ -270,7 +291,6 @@ module REXML
270
291
  @source.position = start_position
271
292
  raise REXML::ParseException.new(message, @source)
272
293
  end
273
- @nsstack.unshift(Set.new)
274
294
  name = parse_name(base_error_message)
275
295
  if @source.match(/\s*\[/um, true)
276
296
  id = [nil, nil, nil]
@@ -318,7 +338,11 @@ module REXML
318
338
  raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
319
339
  return [ :elementdecl, "<!ELEMENT" + md[1] ]
320
340
  elsif @source.match("ENTITY", true)
321
- match = [:entitydecl, *@source.match(Private::ENTITYDECL_PATTERN, true, term: Private::ENTITY_TERM).captures.compact]
341
+ match_data = @source.match(Private::ENTITYDECL_PATTERN, true)
342
+ unless match_data
343
+ raise REXML::ParseException.new("Malformed entity declaration", @source)
344
+ end
345
+ match = [:entitydecl, *match_data.captures.compact]
322
346
  ref = false
323
347
  if match[1] == '%'
324
348
  ref = true
@@ -336,6 +360,8 @@ module REXML
336
360
  match[4] = match[4][1..-2] # HREF
337
361
  match.delete_at(5) if match.size > 5 # Chop out NDATA decl
338
362
  # match is [ :entity, name, PUBLIC, pubid, href(, ndata)? ]
363
+ elsif Private::PEREFERENCE_PATTERN.match?(match[2])
364
+ raise REXML::ParseException.new("Parameter entity references forbidden in internal subset: #{match[2]}", @source)
339
365
  else
340
366
  match[2] = match[2][1..-2]
341
367
  match.pop if match.size == 4
@@ -358,7 +384,7 @@ module REXML
358
384
  val = attdef[4] if val == "#FIXED "
359
385
  pairs[attdef[0]] = val
360
386
  if attdef[0] =~ /^xmlns:(.*)/
361
- @nsstack[0] << $1
387
+ @namespaces[$1] = val
362
388
  end
363
389
  end
364
390
  end
@@ -383,14 +409,14 @@ module REXML
383
409
  raise REXML::ParseException.new(message, @source)
384
410
  end
385
411
  return [:notationdecl, name, *id]
386
- elsif md = @source.match(/--(.*?)-->/um, true, term: Private::COMMENT_TERM)
412
+ elsif md = @source.match(/--(.*?)-->/um, true)
387
413
  case md[1]
388
414
  when /--/, /-\z/
389
415
  raise REXML::ParseException.new("Malformed comment", @source)
390
416
  end
391
417
  return [ :comment, md[1] ] if md
392
418
  end
393
- elsif match = @source.match(/(%.*?;)\s*/um, true, term: Private::DOCTYPE_TERM)
419
+ elsif match = @source.match(/(%.*?;)\s*/um, true)
394
420
  return [ :externalentity, match[1] ]
395
421
  elsif @source.match(/\]\s*>/um, true)
396
422
  @document_status = :after_doctype
@@ -411,7 +437,7 @@ module REXML
411
437
  # here explicitly.
412
438
  @source.ensure_buffer
413
439
  if @source.match("/", true)
414
- @nsstack.shift
440
+ @namespaces_restore_stack.pop
415
441
  last_tag = @tags.pop
416
442
  md = @source.match(Private::CLOSE_PATTERN, true)
417
443
  if md and !last_tag
@@ -430,7 +456,7 @@ module REXML
430
456
  #STDERR.puts "SOURCE BUFFER = #{source.buffer}, #{source.buffer.size}"
431
457
  raise REXML::ParseException.new("Malformed node", @source) unless md
432
458
  if md[0][0] == ?-
433
- md = @source.match(/--(.*?)-->/um, true, term: Private::COMMENT_TERM)
459
+ md = @source.match(/--(.*?)-->/um, true)
434
460
 
435
461
  if md.nil? || /--|-\z/.match?(md[1])
436
462
  raise REXML::ParseException.new("Malformed comment", @source)
@@ -438,13 +464,13 @@ module REXML
438
464
 
439
465
  return [ :comment, md[1] ]
440
466
  else
441
- md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true, term: Private::CDATA_TERM)
467
+ md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
442
468
  return [ :cdata, md[1] ] if md
443
469
  end
444
470
  raise REXML::ParseException.new( "Declarations can only occur "+
445
471
  "in the doctype declaration.", @source)
446
472
  elsif @source.match("?", true)
447
- return process_instruction(start_position)
473
+ return process_instruction
448
474
  else
449
475
  # Get the next tag
450
476
  md = @source.match(Private::TAG_PATTERN, true)
@@ -456,18 +482,18 @@ module REXML
456
482
  @document_status = :in_element
457
483
  @prefixes.clear
458
484
  @prefixes << md[2] if md[2]
459
- @nsstack.unshift(curr_ns=Set.new)
460
- attributes, closed = parse_attributes(@prefixes, curr_ns)
485
+ push_namespaces_restore
486
+ attributes, closed = parse_attributes(@prefixes)
461
487
  # Verify that all of the prefixes have been defined
462
488
  for prefix in @prefixes
463
- unless @nsstack.find{|k| k.member?(prefix)}
489
+ unless @namespaces.key?(prefix)
464
490
  raise UndefinedNamespaceException.new(prefix,@source,self)
465
491
  end
466
492
  end
467
493
 
468
494
  if closed
469
495
  @closed = tag
470
- @nsstack.shift
496
+ pop_namespaces_restore
471
497
  else
472
498
  if @tags.empty? and @have_root
473
499
  raise ParseException.new("Malformed XML: Extra tag at the end of the document (got '<#{tag}')", @source)
@@ -482,11 +508,15 @@ module REXML
482
508
  if text.chomp!("<")
483
509
  @source.position -= "<".bytesize
484
510
  end
485
- if @tags.empty? and @have_root
511
+ if @tags.empty?
486
512
  unless /\A\s*\z/.match?(text)
487
- raise ParseException.new("Malformed XML: Extra content at the end of the document (got '#{text}')", @source)
513
+ if @have_root
514
+ raise ParseException.new("Malformed XML: Extra content at the end of the document (got '#{text}')", @source)
515
+ else
516
+ raise ParseException.new("Malformed XML: Content at the start of the document (got '#{text}')", @source)
517
+ end
488
518
  end
489
- return pull_event
519
+ return pull_event if @have_root
490
520
  end
491
521
  return [ :text, text ]
492
522
  end
@@ -503,13 +533,13 @@ module REXML
503
533
  private :pull_event
504
534
 
505
535
  def entity( reference, entities )
506
- value = nil
507
- value = entities[ reference ] if entities
508
- if not value
509
- value = DEFAULT_ENTITIES[ reference ]
510
- value = value[2] if value
511
- end
512
- unnormalize( value, entities ) if value
536
+ return unless entities
537
+
538
+ value = entities[ reference ]
539
+ return if value.nil?
540
+
541
+ record_entity_expansion
542
+ unnormalize( value, entities )
513
543
  end
514
544
 
515
545
  # Escapes all possible entities
@@ -543,17 +573,29 @@ module REXML
543
573
  [Integer(m)].pack('U*')
544
574
  }
545
575
  matches.collect!{|x|x[0]}.compact!
576
+ if filter
577
+ matches.reject! do |entity_reference|
578
+ filter.include?(entity_reference)
579
+ end
580
+ end
546
581
  if matches.size > 0
547
- matches.each do |entity_reference|
548
- unless filter and filter.include?(entity_reference)
549
- entity_value = entity( entity_reference, entities )
550
- if entity_value
551
- re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
552
- rv.gsub!( re, entity_value )
553
- else
554
- er = DEFAULT_ENTITIES[entity_reference]
555
- rv.gsub!( er[0], er[2] ) if er
582
+ matches.tally.each do |entity_reference, n|
583
+ entity_expansion_count_before = @entity_expansion_count
584
+ entity_value = entity( entity_reference, entities )
585
+ if entity_value
586
+ if n > 1
587
+ entity_expansion_count_delta =
588
+ @entity_expansion_count - entity_expansion_count_before
589
+ record_entity_expansion(entity_expansion_count_delta * (n - 1))
590
+ end
591
+ re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
592
+ rv.gsub!( re, entity_value )
593
+ if rv.bytesize > @entity_expansion_text_limit
594
+ raise "entity expansion has grown too large"
556
595
  end
596
+ else
597
+ er = DEFAULT_ENTITIES[entity_reference]
598
+ rv.gsub!( er[0], er[2] ) if er
557
599
  end
558
600
  end
559
601
  rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
@@ -562,6 +604,39 @@ module REXML
562
604
  end
563
605
 
564
606
  private
607
+ def add_namespace(prefix, uri)
608
+ @namespaces_restore_stack.last[prefix] = @namespaces[prefix]
609
+ if uri.nil?
610
+ @namespaces.delete(prefix)
611
+ else
612
+ @namespaces[prefix] = uri
613
+ end
614
+ end
615
+
616
+ def push_namespaces_restore
617
+ namespaces_restore = {}
618
+ @namespaces_restore_stack.push(namespaces_restore)
619
+ namespaces_restore
620
+ end
621
+
622
+ def pop_namespaces_restore
623
+ namespaces_restore = @namespaces_restore_stack.pop
624
+ namespaces_restore.each do |prefix, uri|
625
+ if uri.nil?
626
+ @namespaces.delete(prefix)
627
+ else
628
+ @namespaces[prefix] = uri
629
+ end
630
+ end
631
+ end
632
+
633
+ def record_entity_expansion(delta=1)
634
+ @entity_expansion_count += delta
635
+ if @entity_expansion_count > @entity_expansion_limit
636
+ raise "number of entity expansions exceeded, processing aborted."
637
+ end
638
+ end
639
+
565
640
  def need_source_encoding_update?(xml_declaration_encoding)
566
641
  return false if xml_declaration_encoding.nil?
567
642
  return false if /\AUTF-16\z/i =~ xml_declaration_encoding
@@ -571,14 +646,14 @@ module REXML
571
646
  def parse_name(base_error_message)
572
647
  md = @source.match(Private::NAME_PATTERN, true)
573
648
  unless md
574
- if @source.match(/\s*\S/um)
649
+ if @source.match(/\S/um)
575
650
  message = "#{base_error_message}: invalid name"
576
651
  else
577
652
  message = "#{base_error_message}: name is missing"
578
653
  end
579
654
  raise REXML::ParseException.new(message, @source)
580
655
  end
581
- md[1]
656
+ md[0]
582
657
  end
583
658
 
584
659
  def parse_id(base_error_message,
@@ -647,18 +722,24 @@ module REXML
647
722
  end
648
723
  end
649
724
 
650
- def process_instruction(start_position)
651
- match_data = @source.match(Private::INSTRUCTION_END, true, term: Private::INSTRUCTION_TERM)
652
- unless match_data
653
- message = "Invalid processing instruction node"
654
- @source.position = start_position
655
- raise REXML::ParseException.new(message, @source)
725
+ def process_instruction
726
+ name = parse_name("Malformed XML: Invalid processing instruction node")
727
+ if @source.match(/\s+/um, true)
728
+ match_data = @source.match(/(.*?)\?>/um, true)
729
+ unless match_data
730
+ raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
731
+ end
732
+ content = match_data[1]
733
+ else
734
+ content = nil
735
+ unless @source.match("?>", true)
736
+ raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
737
+ end
656
738
  end
657
- if match_data[1] == "xml"
739
+ if name == "xml"
658
740
  if @document_status
659
741
  raise ParseException.new("Malformed XML: XML declaration is not at the start", @source)
660
742
  end
661
- content = match_data[2]
662
743
  version = VERSION.match(content)
663
744
  version = version[1] unless version.nil?
664
745
  encoding = ENCODING.match(content)
@@ -673,11 +754,12 @@ module REXML
673
754
  standalone = standalone[1] unless standalone.nil?
674
755
  return [ :xmldecl, version, encoding, standalone ]
675
756
  end
676
- [:processing_instruction, match_data[1], match_data[2]]
757
+ [:processing_instruction, name, content]
677
758
  end
678
759
 
679
- def parse_attributes(prefixes, curr_ns)
760
+ def parse_attributes(prefixes)
680
761
  attributes = {}
762
+ expanded_names = {}
681
763
  closed = false
682
764
  while true
683
765
  if @source.match(">", true)
@@ -709,7 +791,7 @@ module REXML
709
791
  @source.match(/\s*/um, true)
710
792
  if prefix == "xmlns"
711
793
  if local_part == "xml"
712
- if value != "http://www.w3.org/XML/1998/namespace"
794
+ if value != Private::XML_PREFIXED_NAMESPACE
713
795
  msg = "The 'xml' prefix must not be bound to any other namespace "+
714
796
  "(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
715
797
  raise REXML::ParseException.new( msg, @source, self )
@@ -719,7 +801,7 @@ module REXML
719
801
  "(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
720
802
  raise REXML::ParseException.new( msg, @source, self)
721
803
  end
722
- curr_ns << local_part
804
+ add_namespace(local_part, value)
723
805
  elsif prefix
724
806
  prefixes << prefix unless prefix == "xml"
725
807
  end
@@ -729,6 +811,20 @@ module REXML
729
811
  raise REXML::ParseException.new(msg, @source, self)
730
812
  end
731
813
 
814
+ unless prefix == "xmlns"
815
+ uri = @namespaces[prefix]
816
+ expanded_name = [uri, local_part]
817
+ existing_prefix = expanded_names[expanded_name]
818
+ if existing_prefix
819
+ message = "Namespace conflict in adding attribute " +
820
+ "\"#{local_part}\": " +
821
+ "Prefix \"#{existing_prefix}\" = \"#{uri}\" and " +
822
+ "prefix \"#{prefix}\" = \"#{uri}\""
823
+ raise REXML::ParseException.new(message, @source, self)
824
+ end
825
+ expanded_names[expanded_name] = prefix
826
+ end
827
+
732
828
  attributes[name] = value
733
829
  else
734
830
  message = "Invalid attribute name: <#{@source.buffer.split(%r{[/>\s]}).first}>"
@@ -47,6 +47,18 @@ module REXML
47
47
  @listeners << listener
48
48
  end
49
49
 
50
+ def entity_expansion_count
51
+ @parser.entity_expansion_count
52
+ end
53
+
54
+ def entity_expansion_limit=( limit )
55
+ @parser.entity_expansion_limit = limit
56
+ end
57
+
58
+ def entity_expansion_text_limit=( limit )
59
+ @parser.entity_expansion_text_limit = limit
60
+ end
61
+
50
62
  def each
51
63
  while has_next?
52
64
  yield self.pull
@@ -22,6 +22,18 @@ module REXML
22
22
  @parser.source
23
23
  end
24
24
 
25
+ def entity_expansion_count
26
+ @parser.entity_expansion_count
27
+ end
28
+
29
+ def entity_expansion_limit=( limit )
30
+ @parser.entity_expansion_limit = limit
31
+ end
32
+
33
+ def entity_expansion_text_limit=( limit )
34
+ @parser.entity_expansion_text_limit = limit
35
+ end
36
+
25
37
  def add_listener( listener )
26
38
  @parser.add_listener( listener )
27
39
  end
@@ -247,6 +259,8 @@ module REXML
247
259
  end
248
260
 
249
261
  def get_namespace( prefix )
262
+ return nil if @namespace_stack.empty?
263
+
250
264
  uris = (@namespace_stack.find_all { |ns| not ns[prefix].nil? }) ||
251
265
  (@namespace_stack.find { |ns| not ns[nil].nil? })
252
266
  uris[-1][prefix] unless uris.nil? or 0 == uris.size
@@ -7,36 +7,41 @@ module REXML
7
7
  def initialize source, listener
8
8
  @listener = listener
9
9
  @parser = BaseParser.new( source )
10
- @tag_stack = []
10
+ @entities = {}
11
11
  end
12
12
 
13
13
  def add_listener( listener )
14
14
  @parser.add_listener( listener )
15
15
  end
16
16
 
17
+ def entity_expansion_count
18
+ @parser.entity_expansion_count
19
+ end
20
+
21
+ def entity_expansion_limit=( limit )
22
+ @parser.entity_expansion_limit = limit
23
+ end
24
+
25
+ def entity_expansion_text_limit=( limit )
26
+ @parser.entity_expansion_text_limit = limit
27
+ end
28
+
17
29
  def parse
18
30
  # entity string
19
31
  while true
20
32
  event = @parser.pull
21
33
  case event[0]
22
34
  when :end_document
23
- unless @tag_stack.empty?
24
- tag_path = "/" + @tag_stack.join("/")
25
- raise ParseException.new("Missing end tag for '#{tag_path}'",
26
- @parser.source)
27
- end
28
35
  return
29
36
  when :start_element
30
- @tag_stack << event[1]
31
37
  attrs = event[2].each do |n, v|
32
38
  event[2][n] = @parser.unnormalize( v )
33
39
  end
34
40
  @listener.tag_start( event[1], attrs )
35
41
  when :end_element
36
42
  @listener.tag_end( event[1] )
37
- @tag_stack.pop
38
43
  when :text
39
- unnormalized = @parser.unnormalize( event[1] )
44
+ unnormalized = @parser.unnormalize( event[1], @entities )
40
45
  @listener.text( unnormalized )
41
46
  when :processing_instruction
42
47
  @listener.instruction( *event[1,2] )
@@ -48,6 +53,7 @@ module REXML
48
53
  when :comment, :attlistdecl, :cdata, :xmldecl, :elementdecl
49
54
  @listener.send( event[0].to_s, *event[1..-1] )
50
55
  when :entitydecl, :notationdecl
56
+ @entities[ event[1] ] = event[2] if event.size == 3
51
57
  @listener.send( event[0].to_s, event[1..-1] )
52
58
  when :externalentity
53
59
  entity_reference = event[1]
@@ -15,7 +15,6 @@ module REXML
15
15
  end
16
16
 
17
17
  def parse
18
- tag_stack = []
19
18
  entities = nil
20
19
  begin
21
20
  while true
@@ -23,19 +22,13 @@ module REXML
23
22
  #STDERR.puts "TREEPARSER GOT #{event.inspect}"
24
23
  case event[0]
25
24
  when :end_document
26
- unless tag_stack.empty?
27
- raise ParseException.new("No close tag for #{@build_context.xpath}",
28
- @parser.source, @parser)
29
- end
30
25
  return
31
26
  when :start_element
32
- tag_stack.push(event[1])
33
27
  el = @build_context = @build_context.add_element( event[1] )
34
28
  event[2].each do |key, value|
35
29
  el.attributes[key]=Attribute.new(key,value,self)
36
30
  end
37
31
  when :end_element
38
- tag_stack.pop
39
32
  @build_context = @build_context.parent
40
33
  when :text
41
34
  if @build_context[-1].instance_of? Text
data/lib/rexml/rexml.rb CHANGED
@@ -31,7 +31,7 @@
31
31
  module REXML
32
32
  COPYRIGHT = "Copyright © 2001-2008 Sean Russell <ser@germane-software.com>"
33
33
  DATE = "2008/019"
34
- VERSION = "3.3.2"
34
+ VERSION = "3.3.8"
35
35
  REVISION = ""
36
36
 
37
37
  Copyright = COPYRIGHT
data/lib/rexml/source.rb CHANGED
@@ -117,7 +117,7 @@ module REXML
117
117
  def ensure_buffer
118
118
  end
119
119
 
120
- def match(pattern, cons=false, term: nil)
120
+ def match(pattern, cons=false)
121
121
  if cons
122
122
  @scanner.scan(pattern).nil? ? nil : @scanner
123
123
  else
@@ -204,10 +204,20 @@ module REXML
204
204
  end
205
205
  end
206
206
 
207
- def read(term = nil)
207
+ def read(term = nil, min_bytes = 1)
208
208
  term = encode(term) if term
209
209
  begin
210
- @scanner << readline(term)
210
+ str = readline(term)
211
+ @scanner << str
212
+ read_bytes = str.bytesize
213
+ begin
214
+ while read_bytes < min_bytes
215
+ str = readline(term)
216
+ @scanner << str
217
+ read_bytes += str.bytesize
218
+ end
219
+ rescue IOError
220
+ end
211
221
  true
212
222
  rescue Exception, NameError
213
223
  @source = nil
@@ -237,10 +247,9 @@ module REXML
237
247
  read if @scanner.eos? && @source
238
248
  end
239
249
 
240
- # Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
241
- # - ">"
242
- # - "XXX>" (X is any string excluding '>')
243
- def match( pattern, cons=false, term: nil )
250
+ def match( pattern, cons=false )
251
+ # To avoid performance issue, we need to increase bytes to read per scan
252
+ min_bytes = 1
244
253
  while true
245
254
  if cons
246
255
  md = @scanner.scan(pattern)
@@ -250,7 +259,8 @@ module REXML
250
259
  break if md
251
260
  return nil if pattern.is_a?(String)
252
261
  return nil if @source.nil?
253
- return nil unless read(term)
262
+ return nil unless read(nil, min_bytes)
263
+ min_bytes *= 2
254
264
  end
255
265
 
256
266
  md.nil? ? nil : @scanner
data/lib/rexml/text.rb CHANGED
@@ -268,7 +268,8 @@ module REXML
268
268
  # u = Text.new( "sean russell", false, nil, true )
269
269
  # u.value #-> "sean russell"
270
270
  def value
271
- @unnormalized ||= Text::unnormalize( @string, doctype )
271
+ @unnormalized ||= Text::unnormalize(@string, doctype,
272
+ entity_expansion_text_limit: document&.entity_expansion_text_limit)
272
273
  end
273
274
 
274
275
  # Sets the contents of this text node. This expects the text to be
@@ -411,11 +412,12 @@ module REXML
411
412
  end
412
413
 
413
414
  # Unescapes all possible entities
414
- def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil )
415
+ def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil, entity_expansion_text_limit: nil )
416
+ entity_expansion_text_limit ||= Security.entity_expansion_text_limit
415
417
  sum = 0
416
418
  string.gsub( /\r\n?/, "\n" ).gsub( REFERENCE ) {
417
419
  s = Text.expand($&, doctype, filter)
418
- if sum + s.bytesize > Security.entity_expansion_text_limit
420
+ if sum + s.bytesize > entity_expansion_text_limit
419
421
  raise "entity expansion has grown too large"
420
422
  else
421
423
  sum += s.bytesize
metadata CHANGED
@@ -1,28 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rexml
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.3.2
4
+ version: 3.3.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kouhei Sutou
8
8
  bindir: bin
9
9
  cert_chain: []
10
- date: 2024-07-16 00:00:00.000000000 Z
11
- dependencies:
12
- - !ruby/object:Gem::Dependency
13
- name: strscan
14
- requirement: !ruby/object:Gem::Requirement
15
- requirements:
16
- - - ">="
17
- - !ruby/object:Gem::Version
18
- version: '0'
19
- type: :runtime
20
- prerelease: false
21
- version_requirements: !ruby/object:Gem::Requirement
22
- requirements:
23
- - - ">="
24
- - !ruby/object:Gem::Version
25
- version: '0'
10
+ date: 2024-09-29 00:00:00.000000000 Z
11
+ dependencies: []
26
12
  description: An XML toolkit for Ruby
27
13
  email:
28
14
  - kou@cozmixng.org
@@ -116,7 +102,7 @@ homepage: https://github.com/ruby/rexml
116
102
  licenses:
117
103
  - BSD-2-Clause
118
104
  metadata:
119
- changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.2
105
+ changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.8
120
106
  rdoc_options:
121
107
  - "--main"
122
108
  - README.md