rexml 3.2.9 → 3.3.4

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of rexml might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ed57404d6f519cb196d671fd629380f5b08a50cf649ae99a71432edceaf15014
4
- data.tar.gz: 9097f235049d98aa743998da10724fd6ff5f6cd0cb4f230c59290af4a75e2134
3
+ metadata.gz: e47ba1209ca1ca2ae0584348378fcefe05de5dc277273d434a37d62e04c676b3
4
+ data.tar.gz: 867f9e01423f83063aac7c59e07670c88c20f527f676e28cdf9d098248293c56
5
5
  SHA512:
6
- metadata.gz: 0d7a0be04c12fcd88c64dd2962d4db49f57ed02c08e3f4628e8e546aea135c672839a19a950423afa72d0cf25af0554558f2261b6b5f2b23637f4ca47d43bd73
7
- data.tar.gz: 28479b3b11de58e84f57dc88057f185eb2365998f7fbc9f0df78d5899bb9b241c6bb93d26a57ca1b54d8a058b743daa78a8df52c4ddbfbb5f7c94aeb97724808
6
+ metadata.gz: d87d9cd9384218f3a9bd65870cef99e057022c83bae434318daeab781444378ea830ce46ae20879954f2ae54a7a00cc54eac2839784b989612315ddef909c809
7
+ data.tar.gz: 1e61927c65b9a058626d0ab19c7f5af0d49169d896e76402e0152476cc772dabf41b8f7a135040b12f5c46eac933de8e60d21fdea8388ed7342be8cc6f9114e9
data/NEWS.md CHANGED
@@ -1,12 +1,170 @@
1
1
  # News
2
2
 
3
- ## 3.2.9 - 2024-06-19 {#version-3-2-9}
3
+ ## 3.3.4 - 2024-08-01 {#version-3-3-4}
4
+
5
+ ### Fixes
6
+
7
+ * Fixed a bug that `REXML::Security` isn't defined when
8
+ `REXML::Parsers::StreamParser` is used and
9
+ `rexml/parsers/streamparser` is only required.
10
+ * GH-189
11
+ * Patch by takuya kodama.
12
+
13
+ ### Thanks
14
+
15
+ * takuya kodama
16
+
17
+ ## 3.3.3 - 2024-08-01 {#version-3-3-3}
18
+
19
+ ### Improvements
20
+
21
+ * Added support for detecting invalid XML that has unsupported
22
+ content before root element
23
+ * GH-184
24
+ * Patch by NAITOH Jun.
25
+
26
+ * Added support for `REXML::Security.entity_expansion_limit=` and
27
+ `REXML::Security.entity_expansion_text_limit=` in SAX2 and pull
28
+ parsers
29
+ * GH-187
30
+ * Patch by NAITOH Jun.
31
+
32
+ * Added more tests for invalid XMLs.
33
+ * GH-183
34
+ * Patch by Watson.
35
+
36
+ * Added more performance tests.
37
+ * Patch by Watson.
38
+
39
+ * Improved parse performance.
40
+ * GH-186
41
+ * Patch by tomoya ishida.
42
+
43
+ ### Thanks
44
+
45
+ * NAITOH Jun
46
+
47
+ * Watson
48
+
49
+ * tomoya ishida
50
+
51
+ ## 3.3.2 - 2024-07-16 {#version-3-3-2}
52
+
53
+ ### Improvements
54
+
55
+ * Improved parse performance.
56
+ * GH-160
57
+ * Patch by NAITOH Jun.
58
+
59
+ * Improved parse performance.
60
+ * GH-169
61
+ * GH-170
62
+ * GH-171
63
+ * GH-172
64
+ * GH-173
65
+ * GH-174
66
+ * GH-175
67
+ * GH-176
68
+ * GH-177
69
+ * Patch by Watson.
70
+
71
+ * Added support for raising a parse exception when an XML has extra
72
+ content after the root element.
73
+ * GH-161
74
+ * Patch by NAITOH Jun.
75
+
76
+ * Added support for raising a parse exception when an XML
77
+ declaration exists in wrong position.
78
+ * GH-162
79
+ * Patch by NAITOH Jun.
80
+
81
+ * Removed needless a space after XML declaration in pretty print mode.
82
+ * GH-164
83
+ * Patch by NAITOH Jun.
84
+
85
+ * Stopped to emit `:text` event after the root element.
86
+ * GH-167
87
+ * Patch by NAITOH Jun.
88
+
89
+ ### Fixes
90
+
91
+ * Fixed a bug that SAX2 parser doesn't expand predefined entities for
92
+ `characters` callback.
93
+ * GH-168
94
+ * Patch by NAITOH Jun.
95
+
96
+ ### Thanks
97
+
98
+ * NAITOH Jun
99
+
100
+ * Watson
101
+
102
+ ## 3.3.1 - 2024-06-25 {#version-3-3-1}
103
+
104
+ ### Improvements
105
+
106
+ * Added support for detecting malformed top-level comments.
107
+ * GH-145
108
+ * Patch by Hiroya Fujinami.
109
+
110
+ * Improved `REXML::Element#attribute` performance.
111
+ * GH-146
112
+ * Patch by Hiroya Fujinami.
113
+
114
+ * Added support for detecting malformed `<!-->` comments.
115
+ * GH-147
116
+ * Patch by Hiroya Fujinami.
117
+
118
+ * Added support for detecting unclosed `DOCTYPE`.
119
+ * GH-152
120
+ * Patch by Hiroya Fujinami.
121
+
122
+ * Added `changlog_uri` metadata to gemspec.
123
+ * GH-156
124
+ * Patch by fynsta.
125
+
126
+ * Improved parse performance.
127
+ * GH-157
128
+ * GH-158
129
+ * Patch by NAITOH Jun.
130
+
131
+ ### Fixes
132
+
133
+ * Fixed a bug that large XML can't be parsed.
134
+ * GH-154
135
+ * Patch by NAITOH Jun.
136
+
137
+ * Fixed a bug that private constants are visible.
138
+ * GH-155
139
+ * Patch by NAITOH Jun.
140
+
141
+ ### Thanks
142
+
143
+ * Hiroya Fujinami
144
+
145
+ * NAITOH Jun
146
+
147
+ * fynsta
148
+
149
+ ## 3.3.0 - 2024-06-11 {#version-3-3-0}
150
+
151
+ ### Improvements
152
+
153
+ * Added support for strscan 0.7.0 installed with Ruby 2.6.
154
+ * GH-142
155
+ * Reported by Fernando Trigoso.
156
+
157
+ ### Thanks
158
+
159
+ * Fernando Trigoso
160
+
161
+ ## 3.2.9 - 2024-06-09 {#version-3-2-9}
4
162
 
5
163
  ### Improvements
6
164
 
7
165
  * Added support for old strscan.
8
166
  * GH-132
9
- * Reported by Adam
167
+ * Reported by Adam.
10
168
 
11
169
  * Improved attribute value parse performance.
12
170
  * GH-135
data/lib/rexml/element.rb CHANGED
@@ -7,14 +7,6 @@ require_relative "xpath"
7
7
  require_relative "parseexception"
8
8
 
9
9
  module REXML
10
- # An implementation note about namespaces:
11
- # As we parse, when we find namespaces we put them in a hash and assign
12
- # them a unique ID. We then convert the namespace prefix for the node
13
- # to the unique ID. This makes namespace lookup much faster for the
14
- # cost of extra memory use. We save the namespace prefix for the
15
- # context node and convert it back when we write it.
16
- @@namespaces = {}
17
-
18
10
  # An \REXML::Element object represents an XML element.
19
11
  #
20
12
  # An element:
@@ -1284,16 +1276,11 @@ module REXML
1284
1276
  # document.root.attribute("x", "a") # => a:x='a:x'
1285
1277
  #
1286
1278
  def attribute( name, namespace=nil )
1287
- prefix = nil
1288
- if namespaces.respond_to? :key
1289
- prefix = namespaces.key(namespace) if namespace
1290
- else
1291
- prefix = namespaces.index(namespace) if namespace
1292
- end
1279
+ prefix = namespaces.key(namespace) if namespace
1293
1280
  prefix = nil if prefix == 'xmlns'
1294
1281
 
1295
1282
  ret_val =
1296
- attributes.get_attribute( "#{prefix ? prefix + ':' : ''}#{name}" )
1283
+ attributes.get_attribute( prefix ? "#{prefix}:#{name}" : name )
1297
1284
 
1298
1285
  return ret_val unless ret_val.nil?
1299
1286
  return nil if prefix.nil?
@@ -111,7 +111,7 @@ module REXML
111
111
  # itself, then we don't need a carriage return... which makes this
112
112
  # logic more complex.
113
113
  node.children.each { |child|
114
- next if child == node.children[-1] and child.instance_of?(Text)
114
+ next if child.instance_of?(Text)
115
115
  unless child == node.children[0] or child.instance_of?(Text) or
116
116
  (child == node.children[1] and !node.children[0].writethis)
117
117
  output << "\n"
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
  require_relative '../parseexception'
3
3
  require_relative '../undefinednamespaceexception'
4
+ require_relative '../security'
4
5
  require_relative '../source'
5
6
  require 'set'
6
7
  require "strscan"
@@ -124,21 +125,28 @@ module REXML
124
125
  }
125
126
 
126
127
  module Private
127
- INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
128
128
  TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
129
129
  CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
130
130
  ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
131
- NAME_PATTERN = /\s*#{NAME}/um
131
+ NAME_PATTERN = /#{NAME}/um
132
132
  GEDECL_PATTERN = "\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
133
133
  PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
134
134
  ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
135
+ CARRIAGE_RETURN_NEWLINE_PATTERN = /\r\n?/
136
+ CHARACTER_REFERENCES = /&#0*((?:\d+)|(?:x[a-fA-F0-9]+));/
137
+ DEFAULT_ENTITIES_PATTERNS = {}
138
+ default_entities = ['gt', 'lt', 'quot', 'apos', 'amp']
139
+ default_entities.each do |term|
140
+ DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
141
+ end
135
142
  end
136
143
  private_constant :Private
137
- include Private
138
144
 
139
145
  def initialize( source )
140
146
  self.stream = source
141
147
  @listeners = []
148
+ @prefixes = Set.new
149
+ @entity_expansion_count = 0
142
150
  end
143
151
 
144
152
  def add_listener( listener )
@@ -146,10 +154,12 @@ module REXML
146
154
  end
147
155
 
148
156
  attr_reader :source
157
+ attr_reader :entity_expansion_count
149
158
 
150
159
  def stream=( source )
151
160
  @source = SourceFactory.create_from( source )
152
161
  @closed = nil
162
+ @have_root = false
153
163
  @document_status = nil
154
164
  @tags = []
155
165
  @stack = []
@@ -204,6 +214,8 @@ module REXML
204
214
 
205
215
  # Returns the next event. This is a +PullEvent+ object.
206
216
  def pull
217
+ @source.drop_parsed_content
218
+
207
219
  pull_event.tap do |event|
208
220
  @listeners.each do |listener|
209
221
  listener.receive event
@@ -216,7 +228,12 @@ module REXML
216
228
  x, @closed = @closed, nil
217
229
  return [ :end_element, x ]
218
230
  end
219
- return [ :end_document ] if empty?
231
+ if empty?
232
+ if @document_status == :in_doctype
233
+ raise ParseException.new("Malformed DOCTYPE: unclosed", @source)
234
+ end
235
+ return [ :end_document ]
236
+ end
220
237
  return @stack.shift if @stack.size > 0
221
238
  #STDERR.puts @source.encoding
222
239
  #STDERR.puts "BUFFER = #{@source.buffer.inspect}"
@@ -225,10 +242,17 @@ module REXML
225
242
  if @document_status == nil
226
243
  start_position = @source.position
227
244
  if @source.match("<?", true)
228
- return process_instruction(start_position)
245
+ return process_instruction
229
246
  elsif @source.match("<!", true)
230
247
  if @source.match("--", true)
231
- return [ :comment, @source.match(/(.*?)-->/um, true)[1] ]
248
+ md = @source.match(/(.*?)-->/um, true)
249
+ if md.nil?
250
+ raise REXML::ParseException.new("Unclosed comment", @source)
251
+ end
252
+ if /--|-\z/.match?(md[1])
253
+ raise REXML::ParseException.new("Malformed comment", @source)
254
+ end
255
+ return [ :comment, md[1] ]
232
256
  elsif @source.match("DOCTYPE", true)
233
257
  base_error_message = "Malformed DOCTYPE"
234
258
  unless @source.match(/\s+/um, true)
@@ -240,7 +264,7 @@ module REXML
240
264
  @source.position = start_position
241
265
  raise REXML::ParseException.new(message, @source)
242
266
  end
243
- @nsstack.unshift(curr_ns=Set.new)
267
+ @nsstack.unshift(Set.new)
244
268
  name = parse_name(base_error_message)
245
269
  if @source.match(/\s*\[/um, true)
246
270
  id = [nil, nil, nil]
@@ -288,7 +312,11 @@ module REXML
288
312
  raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
289
313
  return [ :elementdecl, "<!ELEMENT" + md[1] ]
290
314
  elsif @source.match("ENTITY", true)
291
- match = [:entitydecl, *@source.match(ENTITYDECL_PATTERN, true).captures.compact]
315
+ match_data = @source.match(Private::ENTITYDECL_PATTERN, true)
316
+ unless match_data
317
+ raise REXML::ParseException.new("Malformed entity declaration", @source)
318
+ end
319
+ match = [:entitydecl, *match_data.captures.compact]
292
320
  ref = false
293
321
  if match[1] == '%'
294
322
  ref = true
@@ -314,13 +342,13 @@ module REXML
314
342
  match << '%' if ref
315
343
  return match
316
344
  elsif @source.match("ATTLIST", true)
317
- md = @source.match(ATTLISTDECL_END, true)
345
+ md = @source.match(Private::ATTLISTDECL_END, true)
318
346
  raise REXML::ParseException.new( "Bad ATTLIST declaration!", @source ) if md.nil?
319
347
  element = md[1]
320
348
  contents = md[0]
321
349
 
322
350
  pairs = {}
323
- values = md[0].scan( ATTDEF_RE )
351
+ values = md[0].strip.scan( ATTDEF_RE )
324
352
  values.each do |attdef|
325
353
  unless attdef[3] == "#IMPLIED"
326
354
  attdef.compact!
@@ -366,6 +394,9 @@ module REXML
366
394
  @document_status = :after_doctype
367
395
  return [ :end_doctype ]
368
396
  end
397
+ if @document_status == :in_doctype
398
+ raise ParseException.new("Malformed DOCTYPE: invalid declaration", @source)
399
+ end
369
400
  end
370
401
  if @document_status == :after_doctype
371
402
  @source.match(/\s*/um, true)
@@ -380,7 +411,7 @@ module REXML
380
411
  if @source.match("/", true)
381
412
  @nsstack.shift
382
413
  last_tag = @tags.pop
383
- md = @source.match(CLOSE_PATTERN, true)
414
+ md = @source.match(Private::CLOSE_PATTERN, true)
384
415
  if md and !last_tag
385
416
  message = "Unexpected top-level end tag (got '#{md[1]}')"
386
417
  raise REXML::ParseException.new(message, @source)
@@ -399,12 +430,11 @@ module REXML
399
430
  if md[0][0] == ?-
400
431
  md = @source.match(/--(.*?)-->/um, true)
401
432
 
402
- case md[1]
403
- when /--/, /-\z/
433
+ if md.nil? || /--|-\z/.match?(md[1])
404
434
  raise REXML::ParseException.new("Malformed comment", @source)
405
435
  end
406
436
 
407
- return [ :comment, md[1] ] if md
437
+ return [ :comment, md[1] ]
408
438
  else
409
439
  md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
410
440
  return [ :cdata, md[1] ] if md
@@ -412,22 +442,22 @@ module REXML
412
442
  raise REXML::ParseException.new( "Declarations can only occur "+
413
443
  "in the doctype declaration.", @source)
414
444
  elsif @source.match("?", true)
415
- return process_instruction(start_position)
445
+ return process_instruction
416
446
  else
417
447
  # Get the next tag
418
- md = @source.match(TAG_PATTERN, true)
448
+ md = @source.match(Private::TAG_PATTERN, true)
419
449
  unless md
420
450
  @source.position = start_position
421
451
  raise REXML::ParseException.new("malformed XML: missing tag start", @source)
422
452
  end
423
453
  tag = md[1]
424
454
  @document_status = :in_element
425
- prefixes = Set.new
426
- prefixes << md[2] if md[2]
455
+ @prefixes.clear
456
+ @prefixes << md[2] if md[2]
427
457
  @nsstack.unshift(curr_ns=Set.new)
428
- attributes, closed = parse_attributes(prefixes, curr_ns)
458
+ attributes, closed = parse_attributes(@prefixes, curr_ns)
429
459
  # Verify that all of the prefixes have been defined
430
- for prefix in prefixes
460
+ for prefix in @prefixes
431
461
  unless @nsstack.find{|k| k.member?(prefix)}
432
462
  raise UndefinedNamespaceException.new(prefix,@source,self)
433
463
  end
@@ -437,8 +467,12 @@ module REXML
437
467
  @closed = tag
438
468
  @nsstack.shift
439
469
  else
470
+ if @tags.empty? and @have_root
471
+ raise ParseException.new("Malformed XML: Extra tag at the end of the document (got '<#{tag}')", @source)
472
+ end
440
473
  @tags.push( tag )
441
474
  end
475
+ @have_root = true
442
476
  return [ :start_element, tag, attributes ]
443
477
  end
444
478
  else
@@ -446,6 +480,16 @@ module REXML
446
480
  if text.chomp!("<")
447
481
  @source.position -= "<".bytesize
448
482
  end
483
+ if @tags.empty?
484
+ unless /\A\s*\z/.match?(text)
485
+ if @have_root
486
+ raise ParseException.new("Malformed XML: Extra content at the end of the document (got '#{text}')", @source)
487
+ else
488
+ raise ParseException.new("Malformed XML: Content at the start of the document (got '#{text}')", @source)
489
+ end
490
+ end
491
+ return pull_event if @have_root
492
+ end
449
493
  return [ :text, text ]
450
494
  end
451
495
  rescue REXML::UndefinedNamespaceException
@@ -463,7 +507,9 @@ module REXML
463
507
  def entity( reference, entities )
464
508
  value = nil
465
509
  value = entities[ reference ] if entities
466
- if not value
510
+ if value
511
+ record_entity_expansion
512
+ else
467
513
  value = DEFAULT_ENTITIES[ reference ]
468
514
  value = value[2] if value
469
515
  end
@@ -488,34 +534,51 @@ module REXML
488
534
 
489
535
  # Unescapes all possible entities
490
536
  def unnormalize( string, entities=nil, filter=nil )
491
- rv = string.gsub( /\r\n?/, "\n" )
537
+ if string.include?("\r")
538
+ rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
539
+ else
540
+ rv = string.dup
541
+ end
492
542
  matches = rv.scan( REFERENCE_RE )
493
543
  return rv if matches.size == 0
494
- rv.gsub!( /&#0*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) {
544
+ rv.gsub!( Private::CHARACTER_REFERENCES ) {
495
545
  m=$1
496
546
  m = "0#{m}" if m[0] == ?x
497
547
  [Integer(m)].pack('U*')
498
548
  }
499
549
  matches.collect!{|x|x[0]}.compact!
500
550
  if matches.size > 0
551
+ sum = 0
501
552
  matches.each do |entity_reference|
502
553
  unless filter and filter.include?(entity_reference)
503
554
  entity_value = entity( entity_reference, entities )
504
555
  if entity_value
505
- re = /&#{entity_reference};/
556
+ re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
506
557
  rv.gsub!( re, entity_value )
558
+ sum += rv.bytesize
559
+ if sum > Security.entity_expansion_text_limit
560
+ raise "entity expansion has grown too large"
561
+ end
507
562
  else
508
563
  er = DEFAULT_ENTITIES[entity_reference]
509
564
  rv.gsub!( er[0], er[2] ) if er
510
565
  end
511
566
  end
512
567
  end
513
- rv.gsub!( /&amp;/, '&' )
568
+ rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
514
569
  end
515
570
  rv
516
571
  end
517
572
 
518
573
  private
574
+
575
+ def record_entity_expansion
576
+ @entity_expansion_count += 1
577
+ if @entity_expansion_count > Security.entity_expansion_limit
578
+ raise "number of entity expansions exceeded, processing aborted."
579
+ end
580
+ end
581
+
519
582
  def need_source_encoding_update?(xml_declaration_encoding)
520
583
  return false if xml_declaration_encoding.nil?
521
584
  return false if /\AUTF-16\z/i =~ xml_declaration_encoding
@@ -523,16 +586,16 @@ module REXML
523
586
  end
524
587
 
525
588
  def parse_name(base_error_message)
526
- md = @source.match(NAME_PATTERN, true)
589
+ md = @source.match(Private::NAME_PATTERN, true)
527
590
  unless md
528
- if @source.match(/\s*\S/um)
591
+ if @source.match(/\S/um)
529
592
  message = "#{base_error_message}: invalid name"
530
593
  else
531
594
  message = "#{base_error_message}: name is missing"
532
595
  end
533
596
  raise REXML::ParseException.new(message, @source)
534
597
  end
535
- md[1]
598
+ md[0]
536
599
  end
537
600
 
538
601
  def parse_id(base_error_message,
@@ -601,15 +664,24 @@ module REXML
601
664
  end
602
665
  end
603
666
 
604
- def process_instruction(start_position)
605
- match_data = @source.match(INSTRUCTION_END, true)
606
- unless match_data
607
- message = "Invalid processing instruction node"
608
- @source.position = start_position
609
- raise REXML::ParseException.new(message, @source)
667
+ def process_instruction
668
+ name = parse_name("Malformed XML: Invalid processing instruction node")
669
+ if @source.match(/\s+/um, true)
670
+ match_data = @source.match(/(.*?)\?>/um, true)
671
+ unless match_data
672
+ raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
673
+ end
674
+ content = match_data[1]
675
+ else
676
+ content = nil
677
+ unless @source.match("?>", true)
678
+ raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
679
+ end
610
680
  end
611
- if @document_status.nil? and match_data[1] == "xml"
612
- content = match_data[2]
681
+ if name == "xml"
682
+ if @document_status
683
+ raise ParseException.new("Malformed XML: XML declaration is not at the start", @source)
684
+ end
613
685
  version = VERSION.match(content)
614
686
  version = version[1] unless version.nil?
615
687
  encoding = ENCODING.match(content)
@@ -624,7 +696,7 @@ module REXML
624
696
  standalone = standalone[1] unless standalone.nil?
625
697
  return [ :xmldecl, version, encoding, standalone ]
626
698
  end
627
- [:processing_instruction, match_data[1], match_data[2]]
699
+ [:processing_instruction, name, content]
628
700
  end
629
701
 
630
702
  def parse_attributes(prefixes, curr_ns)
@@ -47,6 +47,10 @@ module REXML
47
47
  @listeners << listener
48
48
  end
49
49
 
50
+ def entity_expansion_count
51
+ @parser.entity_expansion_count
52
+ end
53
+
50
54
  def each
51
55
  while has_next?
52
56
  yield self.pull
@@ -22,6 +22,10 @@ module REXML
22
22
  @parser.source
23
23
  end
24
24
 
25
+ def entity_expansion_count
26
+ @parser.entity_expansion_count
27
+ end
28
+
25
29
  def add_listener( listener )
26
30
  @parser.add_listener( listener )
27
31
  end
@@ -157,25 +161,8 @@ module REXML
157
161
  end
158
162
  end
159
163
  when :text
160
- #normalized = @parser.normalize( event[1] )
161
- #handle( :characters, normalized )
162
- copy = event[1].clone
163
-
164
- esub = proc { |match|
165
- if @entities.has_key?($1)
166
- @entities[$1].gsub(Text::REFERENCE, &esub)
167
- else
168
- match
169
- end
170
- }
171
-
172
- copy.gsub!( Text::REFERENCE, &esub )
173
- copy.gsub!( Text::NUMERICENTITY ) {|m|
174
- m=$1
175
- m = "0#{m}" if m[0] == ?x
176
- [Integer(m)].pack('U*')
177
- }
178
- handle( :characters, copy )
164
+ unnormalized = @parser.unnormalize( event[1], @entities )
165
+ handle( :characters, unnormalized )
179
166
  when :entitydecl
180
167
  handle_entitydecl( event )
181
168
  when :processing_instruction, :comment, :attlistdecl,
@@ -36,8 +36,8 @@ module REXML
36
36
  @listener.tag_end( event[1] )
37
37
  @tag_stack.pop
38
38
  when :text
39
- normalized = @parser.unnormalize( event[1] )
40
- @listener.text( normalized )
39
+ unnormalized = @parser.unnormalize( event[1] )
40
+ @listener.text( unnormalized )
41
41
  when :processing_instruction
42
42
  @listener.instruction( *event[1,2] )
43
43
  when :start_doctype
@@ -16,7 +16,6 @@ module REXML
16
16
 
17
17
  def parse
18
18
  tag_stack = []
19
- in_doctype = false
20
19
  entities = nil
21
20
  begin
22
21
  while true
@@ -39,17 +38,15 @@ module REXML
39
38
  tag_stack.pop
40
39
  @build_context = @build_context.parent
41
40
  when :text
42
- if not in_doctype
43
- if @build_context[-1].instance_of? Text
44
- @build_context[-1] << event[1]
45
- else
46
- @build_context.add(
47
- Text.new(event[1], @build_context.whitespace, nil, true)
48
- ) unless (
49
- @build_context.ignore_whitespace_nodes and
50
- event[1].strip.size==0
51
- )
52
- end
41
+ if @build_context[-1].instance_of? Text
42
+ @build_context[-1] << event[1]
43
+ else
44
+ @build_context.add(
45
+ Text.new(event[1], @build_context.whitespace, nil, true)
46
+ ) unless (
47
+ @build_context.ignore_whitespace_nodes and
48
+ event[1].strip.size==0
49
+ )
53
50
  end
54
51
  when :comment
55
52
  c = Comment.new( event[1] )
@@ -60,14 +57,12 @@ module REXML
60
57
  when :processing_instruction
61
58
  @build_context.add( Instruction.new( event[1], event[2] ) )
62
59
  when :end_doctype
63
- in_doctype = false
64
60
  entities.each { |k,v| entities[k] = @build_context.entities[k].value }
65
61
  @build_context = @build_context.parent
66
62
  when :start_doctype
67
63
  doctype = DocType.new( event[1..-1], @build_context )
68
64
  @build_context = doctype
69
65
  entities = {}
70
- in_doctype = true
71
66
  when :attlistdecl
72
67
  n = AttlistDecl.new( event[1..-1] )
73
68
  @build_context.add( n )
data/lib/rexml/rexml.rb CHANGED
@@ -31,7 +31,7 @@
31
31
  module REXML
32
32
  COPYRIGHT = "Copyright © 2001-2008 Sean Russell <ser@germane-software.com>"
33
33
  DATE = "2008/019"
34
- VERSION = "3.2.9"
34
+ VERSION = "3.3.4"
35
35
  REVISION = ""
36
36
 
37
37
  Copyright = COPYRIGHT
data/lib/rexml/source.rb CHANGED
@@ -1,8 +1,28 @@
1
1
  # coding: US-ASCII
2
2
  # frozen_string_literal: false
3
+
4
+ require "strscan"
5
+
3
6
  require_relative 'encoding'
4
7
 
5
8
  module REXML
9
+ if StringScanner::Version < "1.0.0"
10
+ module StringScannerCheckScanString
11
+ refine StringScanner do
12
+ def check(pattern)
13
+ pattern = /#{Regexp.escape(pattern)}/ if pattern.is_a?(String)
14
+ super(pattern)
15
+ end
16
+
17
+ def scan(pattern)
18
+ pattern = /#{Regexp.escape(pattern)}/ if pattern.is_a?(String)
19
+ super(pattern)
20
+ end
21
+ end
22
+ end
23
+ using StringScannerCheckScanString
24
+ end
25
+
6
26
  # Generates Source-s. USE THIS CLASS.
7
27
  class SourceFactory
8
28
  # Generates a Source object
@@ -35,6 +55,7 @@ module REXML
35
55
  attr_reader :encoding
36
56
 
37
57
  module Private
58
+ SCANNER_RESET_SIZE = 100000
38
59
  PRE_DEFINED_TERM_PATTERNS = {}
39
60
  pre_defined_terms = ["'", '"', "<"]
40
61
  pre_defined_terms.each do |term|
@@ -42,7 +63,6 @@ module REXML
42
63
  end
43
64
  end
44
65
  private_constant :Private
45
- include Private
46
66
 
47
67
  # Constructor
48
68
  # @param arg must be a String, and should be a valid XML document
@@ -64,6 +84,12 @@ module REXML
64
84
  @scanner.rest
65
85
  end
66
86
 
87
+ def drop_parsed_content
88
+ if @scanner.pos > Private::SCANNER_RESET_SIZE
89
+ @scanner.string = @scanner.rest
90
+ end
91
+ end
92
+
67
93
  def buffer_encoding=(encoding)
68
94
  @scanner.string.force_encoding(encoding)
69
95
  end
@@ -178,10 +204,20 @@ module REXML
178
204
  end
179
205
  end
180
206
 
181
- def read(term = nil)
207
+ def read(term = nil, min_bytes = 1)
182
208
  term = encode(term) if term
183
209
  begin
184
- @scanner << readline(term)
210
+ str = readline(term)
211
+ @scanner << str
212
+ read_bytes = str.bytesize
213
+ begin
214
+ while read_bytes < min_bytes
215
+ str = readline(term)
216
+ @scanner << str
217
+ read_bytes += str.bytesize
218
+ end
219
+ rescue IOError
220
+ end
185
221
  true
186
222
  rescue Exception, NameError
187
223
  @source = nil
@@ -211,10 +247,9 @@ module REXML
211
247
  read if @scanner.eos? && @source
212
248
  end
213
249
 
214
- # Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
215
- # - ">"
216
- # - "XXX>" (X is any string excluding '>')
217
250
  def match( pattern, cons=false )
251
+ # To avoid performance issue, we need to increase bytes to read per scan
252
+ min_bytes = 1
218
253
  while true
219
254
  if cons
220
255
  md = @scanner.scan(pattern)
@@ -224,7 +259,8 @@ module REXML
224
259
  break if md
225
260
  return nil if pattern.is_a?(String)
226
261
  return nil if @source.nil?
227
- return nil unless read
262
+ return nil unless read(nil, min_bytes)
263
+ min_bytes *= 2
228
264
  end
229
265
 
230
266
  md.nil? ? nil : @scanner
data/lib/rexml/text.rb CHANGED
@@ -151,25 +151,45 @@ module REXML
151
151
  end
152
152
  end
153
153
 
154
- # context sensitive
155
- string.scan(pattern) do
156
- if $1[-1] != ?;
157
- raise "Illegal character #{$1.inspect} in raw string #{string.inspect}"
158
- elsif $1[0] == ?&
159
- if $5 and $5[0] == ?#
160
- case ($5[1] == ?x ? $5[2..-1].to_i(16) : $5[1..-1].to_i)
161
- when *VALID_CHAR
154
+ pos = 0
155
+ while (index = string.index(/<|&/, pos))
156
+ if string[index] == "<"
157
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
158
+ end
159
+
160
+ unless (end_index = string.index(/[^\s];/, index + 1))
161
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
162
+ end
163
+
164
+ value = string[(index + 1)..end_index]
165
+ if /\s/.match?(value)
166
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
167
+ end
168
+
169
+ if value[0] == "#"
170
+ character_reference = value[1..-1]
171
+
172
+ unless (/\A(\d+|x[0-9a-fA-F]+)\z/.match?(character_reference))
173
+ if character_reference[0] == "x" || character_reference[-1] == "x"
174
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
162
175
  else
163
- raise "Illegal character #{$1.inspect} in raw string #{string.inspect}"
176
+ raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
164
177
  end
165
- # FIXME: below can't work but this needs API change.
166
- # elsif @parent and $3 and !SUBSTITUTES.include?($1)
167
- # if !doctype or !doctype.entities.has_key?($3)
168
- # raise "Undeclared entity '#{$1}' in raw string \"#{string}\""
169
- # end
170
178
  end
179
+
180
+ case (character_reference[0] == "x" ? character_reference[1..-1].to_i(16) : character_reference[0..-1].to_i)
181
+ when *VALID_CHAR
182
+ else
183
+ raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
184
+ end
185
+ elsif !(/\A#{Entity::NAME}\z/um.match?(value))
186
+ raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
171
187
  end
188
+
189
+ pos = end_index + 1
172
190
  end
191
+
192
+ string
173
193
  end
174
194
 
175
195
  def node_type
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rexml
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.2.9
4
+ version: 3.3.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kouhei Sutou
8
8
  bindir: bin
9
9
  cert_chain: []
10
- date: 2024-06-09 00:00:00.000000000 Z
10
+ date: 2024-08-01 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: strscan
@@ -115,7 +115,8 @@ files:
115
115
  homepage: https://github.com/ruby/rexml
116
116
  licenses:
117
117
  - BSD-2-Clause
118
- metadata: {}
118
+ metadata:
119
+ changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.4
119
120
  rdoc_options:
120
121
  - "--main"
121
122
  - README.md