rexml 3.3.0 → 3.3.5
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of rexml might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/NEWS.md +163 -0
- data/lib/rexml/element.rb +2 -15
- data/lib/rexml/formatters/pretty.rb +1 -1
- data/lib/rexml/parsers/baseparser.rb +107 -37
- data/lib/rexml/parsers/pullparser.rb +4 -0
- data/lib/rexml/parsers/sax2parser.rb +6 -19
- data/lib/rexml/parsers/streamparser.rb +2 -2
- data/lib/rexml/parsers/treeparser.rb +9 -14
- data/lib/rexml/rexml.rb +1 -1
- data/lib/rexml/source.rb +23 -7
- data/lib/rexml/text.rb +34 -14
- metadata +5 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8e2ee370ff6c1ab70149f6743a12ddf1eeae2c2af3c20f8cb7c6e56ff9699eec
|
4
|
+
data.tar.gz: 158254197a12b1038b9b5e116c9abc89a329ef97acda8031399a56d3aee45fe9
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6b805e28e50ef71bbc5d0349fdd4ec57ec4811bba94fe4c3f8aa17bedb81971da48e98205c53a8eadd18f07b69a2f68c8200529d546aef4187f9f3e903670857
|
7
|
+
data.tar.gz: df3e369135f9b156475772a77702a91d45b8ee64ad49f608b2b33dc63d7b07dd271d7ac458d0b5e944e613798a0940231282997a747c4838e3e5c3afaf60253b
|
data/NEWS.md
CHANGED
@@ -1,5 +1,168 @@
|
|
1
1
|
# News
|
2
2
|
|
3
|
+
## 3.3.5 - 2024-08-12 {#version-3-3-5}
|
4
|
+
|
5
|
+
### Fixes
|
6
|
+
|
7
|
+
* Fixed a bug that `REXML::Security.entity_expansion_text_limit`
|
8
|
+
check has wrong text size calculation in SAX and pull parsers.
|
9
|
+
* GH-193
|
10
|
+
* GH-195
|
11
|
+
* Reported by Viktor Ivarsson.
|
12
|
+
* Patch by NAITOH Jun.
|
13
|
+
|
14
|
+
### Thanks
|
15
|
+
|
16
|
+
* Viktor Ivarsson
|
17
|
+
|
18
|
+
* NAITOH Jun
|
19
|
+
|
20
|
+
## 3.3.4 - 2024-08-01 {#version-3-3-4}
|
21
|
+
|
22
|
+
### Fixes
|
23
|
+
|
24
|
+
* Fixed a bug that `REXML::Security` isn't defined when
|
25
|
+
`REXML::Parsers::StreamParser` is used and
|
26
|
+
`rexml/parsers/streamparser` is only required.
|
27
|
+
* GH-189
|
28
|
+
* Patch by takuya kodama.
|
29
|
+
|
30
|
+
### Thanks
|
31
|
+
|
32
|
+
* takuya kodama
|
33
|
+
|
34
|
+
## 3.3.3 - 2024-08-01 {#version-3-3-3}
|
35
|
+
|
36
|
+
### Improvements
|
37
|
+
|
38
|
+
* Added support for detecting invalid XML that has unsupported
|
39
|
+
content before root element
|
40
|
+
* GH-184
|
41
|
+
* Patch by NAITOH Jun.
|
42
|
+
|
43
|
+
* Added support for `REXML::Security.entity_expansion_limit=` and
|
44
|
+
`REXML::Security.entity_expansion_text_limit=` in SAX2 and pull
|
45
|
+
parsers
|
46
|
+
* GH-187
|
47
|
+
* Patch by NAITOH Jun.
|
48
|
+
|
49
|
+
* Added more tests for invalid XMLs.
|
50
|
+
* GH-183
|
51
|
+
* Patch by Watson.
|
52
|
+
|
53
|
+
* Added more performance tests.
|
54
|
+
* Patch by Watson.
|
55
|
+
|
56
|
+
* Improved parse performance.
|
57
|
+
* GH-186
|
58
|
+
* Patch by tomoya ishida.
|
59
|
+
|
60
|
+
### Thanks
|
61
|
+
|
62
|
+
* NAITOH Jun
|
63
|
+
|
64
|
+
* Watson
|
65
|
+
|
66
|
+
* tomoya ishida
|
67
|
+
|
68
|
+
## 3.3.2 - 2024-07-16 {#version-3-3-2}
|
69
|
+
|
70
|
+
### Improvements
|
71
|
+
|
72
|
+
* Improved parse performance.
|
73
|
+
* GH-160
|
74
|
+
* Patch by NAITOH Jun.
|
75
|
+
|
76
|
+
* Improved parse performance.
|
77
|
+
* GH-169
|
78
|
+
* GH-170
|
79
|
+
* GH-171
|
80
|
+
* GH-172
|
81
|
+
* GH-173
|
82
|
+
* GH-174
|
83
|
+
* GH-175
|
84
|
+
* GH-176
|
85
|
+
* GH-177
|
86
|
+
* Patch by Watson.
|
87
|
+
|
88
|
+
* Added support for raising a parse exception when an XML has extra
|
89
|
+
content after the root element.
|
90
|
+
* GH-161
|
91
|
+
* Patch by NAITOH Jun.
|
92
|
+
|
93
|
+
* Added support for raising a parse exception when an XML
|
94
|
+
declaration exists in wrong position.
|
95
|
+
* GH-162
|
96
|
+
* Patch by NAITOH Jun.
|
97
|
+
|
98
|
+
* Removed needless a space after XML declaration in pretty print mode.
|
99
|
+
* GH-164
|
100
|
+
* Patch by NAITOH Jun.
|
101
|
+
|
102
|
+
* Stopped to emit `:text` event after the root element.
|
103
|
+
* GH-167
|
104
|
+
* Patch by NAITOH Jun.
|
105
|
+
|
106
|
+
### Fixes
|
107
|
+
|
108
|
+
* Fixed a bug that SAX2 parser doesn't expand predefined entities for
|
109
|
+
`characters` callback.
|
110
|
+
* GH-168
|
111
|
+
* Patch by NAITOH Jun.
|
112
|
+
|
113
|
+
### Thanks
|
114
|
+
|
115
|
+
* NAITOH Jun
|
116
|
+
|
117
|
+
* Watson
|
118
|
+
|
119
|
+
## 3.3.1 - 2024-06-25 {#version-3-3-1}
|
120
|
+
|
121
|
+
### Improvements
|
122
|
+
|
123
|
+
* Added support for detecting malformed top-level comments.
|
124
|
+
* GH-145
|
125
|
+
* Patch by Hiroya Fujinami.
|
126
|
+
|
127
|
+
* Improved `REXML::Element#attribute` performance.
|
128
|
+
* GH-146
|
129
|
+
* Patch by Hiroya Fujinami.
|
130
|
+
|
131
|
+
* Added support for detecting malformed `<!-->` comments.
|
132
|
+
* GH-147
|
133
|
+
* Patch by Hiroya Fujinami.
|
134
|
+
|
135
|
+
* Added support for detecting unclosed `DOCTYPE`.
|
136
|
+
* GH-152
|
137
|
+
* Patch by Hiroya Fujinami.
|
138
|
+
|
139
|
+
* Added `changlog_uri` metadata to gemspec.
|
140
|
+
* GH-156
|
141
|
+
* Patch by fynsta.
|
142
|
+
|
143
|
+
* Improved parse performance.
|
144
|
+
* GH-157
|
145
|
+
* GH-158
|
146
|
+
* Patch by NAITOH Jun.
|
147
|
+
|
148
|
+
### Fixes
|
149
|
+
|
150
|
+
* Fixed a bug that large XML can't be parsed.
|
151
|
+
* GH-154
|
152
|
+
* Patch by NAITOH Jun.
|
153
|
+
|
154
|
+
* Fixed a bug that private constants are visible.
|
155
|
+
* GH-155
|
156
|
+
* Patch by NAITOH Jun.
|
157
|
+
|
158
|
+
### Thanks
|
159
|
+
|
160
|
+
* Hiroya Fujinami
|
161
|
+
|
162
|
+
* NAITOH Jun
|
163
|
+
|
164
|
+
* fynsta
|
165
|
+
|
3
166
|
## 3.3.0 - 2024-06-11 {#version-3-3-0}
|
4
167
|
|
5
168
|
### Improvements
|
data/lib/rexml/element.rb
CHANGED
@@ -7,14 +7,6 @@ require_relative "xpath"
|
|
7
7
|
require_relative "parseexception"
|
8
8
|
|
9
9
|
module REXML
|
10
|
-
# An implementation note about namespaces:
|
11
|
-
# As we parse, when we find namespaces we put them in a hash and assign
|
12
|
-
# them a unique ID. We then convert the namespace prefix for the node
|
13
|
-
# to the unique ID. This makes namespace lookup much faster for the
|
14
|
-
# cost of extra memory use. We save the namespace prefix for the
|
15
|
-
# context node and convert it back when we write it.
|
16
|
-
@@namespaces = {}
|
17
|
-
|
18
10
|
# An \REXML::Element object represents an XML element.
|
19
11
|
#
|
20
12
|
# An element:
|
@@ -1284,16 +1276,11 @@ module REXML
|
|
1284
1276
|
# document.root.attribute("x", "a") # => a:x='a:x'
|
1285
1277
|
#
|
1286
1278
|
def attribute( name, namespace=nil )
|
1287
|
-
prefix =
|
1288
|
-
if namespaces.respond_to? :key
|
1289
|
-
prefix = namespaces.key(namespace) if namespace
|
1290
|
-
else
|
1291
|
-
prefix = namespaces.index(namespace) if namespace
|
1292
|
-
end
|
1279
|
+
prefix = namespaces.key(namespace) if namespace
|
1293
1280
|
prefix = nil if prefix == 'xmlns'
|
1294
1281
|
|
1295
1282
|
ret_val =
|
1296
|
-
attributes.get_attribute(
|
1283
|
+
attributes.get_attribute( prefix ? "#{prefix}:#{name}" : name )
|
1297
1284
|
|
1298
1285
|
return ret_val unless ret_val.nil?
|
1299
1286
|
return nil if prefix.nil?
|
@@ -111,7 +111,7 @@ module REXML
|
|
111
111
|
# itself, then we don't need a carriage return... which makes this
|
112
112
|
# logic more complex.
|
113
113
|
node.children.each { |child|
|
114
|
-
next if child
|
114
|
+
next if child.instance_of?(Text)
|
115
115
|
unless child == node.children[0] or child.instance_of?(Text) or
|
116
116
|
(child == node.children[1] and !node.children[0].writethis)
|
117
117
|
output << "\n"
|
@@ -1,6 +1,7 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
require_relative '../parseexception'
|
3
3
|
require_relative '../undefinednamespaceexception'
|
4
|
+
require_relative '../security'
|
4
5
|
require_relative '../source'
|
5
6
|
require 'set'
|
6
7
|
require "strscan"
|
@@ -124,21 +125,28 @@ module REXML
|
|
124
125
|
}
|
125
126
|
|
126
127
|
module Private
|
127
|
-
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
|
128
128
|
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
|
129
129
|
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
|
130
130
|
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
|
131
|
-
NAME_PATTERN =
|
131
|
+
NAME_PATTERN = /#{NAME}/um
|
132
132
|
GEDECL_PATTERN = "\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
|
133
133
|
PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
|
134
134
|
ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
|
135
|
+
CARRIAGE_RETURN_NEWLINE_PATTERN = /\r\n?/
|
136
|
+
CHARACTER_REFERENCES = /�*((?:\d+)|(?:x[a-fA-F0-9]+));/
|
137
|
+
DEFAULT_ENTITIES_PATTERNS = {}
|
138
|
+
default_entities = ['gt', 'lt', 'quot', 'apos', 'amp']
|
139
|
+
default_entities.each do |term|
|
140
|
+
DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
|
141
|
+
end
|
135
142
|
end
|
136
143
|
private_constant :Private
|
137
|
-
include Private
|
138
144
|
|
139
145
|
def initialize( source )
|
140
146
|
self.stream = source
|
141
147
|
@listeners = []
|
148
|
+
@prefixes = Set.new
|
149
|
+
@entity_expansion_count = 0
|
142
150
|
end
|
143
151
|
|
144
152
|
def add_listener( listener )
|
@@ -146,10 +154,12 @@ module REXML
|
|
146
154
|
end
|
147
155
|
|
148
156
|
attr_reader :source
|
157
|
+
attr_reader :entity_expansion_count
|
149
158
|
|
150
159
|
def stream=( source )
|
151
160
|
@source = SourceFactory.create_from( source )
|
152
161
|
@closed = nil
|
162
|
+
@have_root = false
|
153
163
|
@document_status = nil
|
154
164
|
@tags = []
|
155
165
|
@stack = []
|
@@ -204,6 +214,8 @@ module REXML
|
|
204
214
|
|
205
215
|
# Returns the next event. This is a +PullEvent+ object.
|
206
216
|
def pull
|
217
|
+
@source.drop_parsed_content
|
218
|
+
|
207
219
|
pull_event.tap do |event|
|
208
220
|
@listeners.each do |listener|
|
209
221
|
listener.receive event
|
@@ -216,7 +228,12 @@ module REXML
|
|
216
228
|
x, @closed = @closed, nil
|
217
229
|
return [ :end_element, x ]
|
218
230
|
end
|
219
|
-
|
231
|
+
if empty?
|
232
|
+
if @document_status == :in_doctype
|
233
|
+
raise ParseException.new("Malformed DOCTYPE: unclosed", @source)
|
234
|
+
end
|
235
|
+
return [ :end_document ]
|
236
|
+
end
|
220
237
|
return @stack.shift if @stack.size > 0
|
221
238
|
#STDERR.puts @source.encoding
|
222
239
|
#STDERR.puts "BUFFER = #{@source.buffer.inspect}"
|
@@ -225,10 +242,17 @@ module REXML
|
|
225
242
|
if @document_status == nil
|
226
243
|
start_position = @source.position
|
227
244
|
if @source.match("<?", true)
|
228
|
-
return process_instruction
|
245
|
+
return process_instruction
|
229
246
|
elsif @source.match("<!", true)
|
230
247
|
if @source.match("--", true)
|
231
|
-
|
248
|
+
md = @source.match(/(.*?)-->/um, true)
|
249
|
+
if md.nil?
|
250
|
+
raise REXML::ParseException.new("Unclosed comment", @source)
|
251
|
+
end
|
252
|
+
if /--|-\z/.match?(md[1])
|
253
|
+
raise REXML::ParseException.new("Malformed comment", @source)
|
254
|
+
end
|
255
|
+
return [ :comment, md[1] ]
|
232
256
|
elsif @source.match("DOCTYPE", true)
|
233
257
|
base_error_message = "Malformed DOCTYPE"
|
234
258
|
unless @source.match(/\s+/um, true)
|
@@ -240,7 +264,7 @@ module REXML
|
|
240
264
|
@source.position = start_position
|
241
265
|
raise REXML::ParseException.new(message, @source)
|
242
266
|
end
|
243
|
-
@nsstack.unshift(
|
267
|
+
@nsstack.unshift(Set.new)
|
244
268
|
name = parse_name(base_error_message)
|
245
269
|
if @source.match(/\s*\[/um, true)
|
246
270
|
id = [nil, nil, nil]
|
@@ -288,7 +312,11 @@ module REXML
|
|
288
312
|
raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
|
289
313
|
return [ :elementdecl, "<!ELEMENT" + md[1] ]
|
290
314
|
elsif @source.match("ENTITY", true)
|
291
|
-
|
315
|
+
match_data = @source.match(Private::ENTITYDECL_PATTERN, true)
|
316
|
+
unless match_data
|
317
|
+
raise REXML::ParseException.new("Malformed entity declaration", @source)
|
318
|
+
end
|
319
|
+
match = [:entitydecl, *match_data.captures.compact]
|
292
320
|
ref = false
|
293
321
|
if match[1] == '%'
|
294
322
|
ref = true
|
@@ -314,13 +342,13 @@ module REXML
|
|
314
342
|
match << '%' if ref
|
315
343
|
return match
|
316
344
|
elsif @source.match("ATTLIST", true)
|
317
|
-
md = @source.match(ATTLISTDECL_END, true)
|
345
|
+
md = @source.match(Private::ATTLISTDECL_END, true)
|
318
346
|
raise REXML::ParseException.new( "Bad ATTLIST declaration!", @source ) if md.nil?
|
319
347
|
element = md[1]
|
320
348
|
contents = md[0]
|
321
349
|
|
322
350
|
pairs = {}
|
323
|
-
values = md[0].scan( ATTDEF_RE )
|
351
|
+
values = md[0].strip.scan( ATTDEF_RE )
|
324
352
|
values.each do |attdef|
|
325
353
|
unless attdef[3] == "#IMPLIED"
|
326
354
|
attdef.compact!
|
@@ -366,6 +394,9 @@ module REXML
|
|
366
394
|
@document_status = :after_doctype
|
367
395
|
return [ :end_doctype ]
|
368
396
|
end
|
397
|
+
if @document_status == :in_doctype
|
398
|
+
raise ParseException.new("Malformed DOCTYPE: invalid declaration", @source)
|
399
|
+
end
|
369
400
|
end
|
370
401
|
if @document_status == :after_doctype
|
371
402
|
@source.match(/\s*/um, true)
|
@@ -380,7 +411,7 @@ module REXML
|
|
380
411
|
if @source.match("/", true)
|
381
412
|
@nsstack.shift
|
382
413
|
last_tag = @tags.pop
|
383
|
-
md = @source.match(CLOSE_PATTERN, true)
|
414
|
+
md = @source.match(Private::CLOSE_PATTERN, true)
|
384
415
|
if md and !last_tag
|
385
416
|
message = "Unexpected top-level end tag (got '#{md[1]}')"
|
386
417
|
raise REXML::ParseException.new(message, @source)
|
@@ -399,12 +430,11 @@ module REXML
|
|
399
430
|
if md[0][0] == ?-
|
400
431
|
md = @source.match(/--(.*?)-->/um, true)
|
401
432
|
|
402
|
-
|
403
|
-
when /--/, /-\z/
|
433
|
+
if md.nil? || /--|-\z/.match?(md[1])
|
404
434
|
raise REXML::ParseException.new("Malformed comment", @source)
|
405
435
|
end
|
406
436
|
|
407
|
-
return [ :comment, md[1] ]
|
437
|
+
return [ :comment, md[1] ]
|
408
438
|
else
|
409
439
|
md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
|
410
440
|
return [ :cdata, md[1] ] if md
|
@@ -412,22 +442,22 @@ module REXML
|
|
412
442
|
raise REXML::ParseException.new( "Declarations can only occur "+
|
413
443
|
"in the doctype declaration.", @source)
|
414
444
|
elsif @source.match("?", true)
|
415
|
-
return process_instruction
|
445
|
+
return process_instruction
|
416
446
|
else
|
417
447
|
# Get the next tag
|
418
|
-
md = @source.match(TAG_PATTERN, true)
|
448
|
+
md = @source.match(Private::TAG_PATTERN, true)
|
419
449
|
unless md
|
420
450
|
@source.position = start_position
|
421
451
|
raise REXML::ParseException.new("malformed XML: missing tag start", @source)
|
422
452
|
end
|
423
453
|
tag = md[1]
|
424
454
|
@document_status = :in_element
|
425
|
-
prefixes
|
426
|
-
prefixes << md[2] if md[2]
|
455
|
+
@prefixes.clear
|
456
|
+
@prefixes << md[2] if md[2]
|
427
457
|
@nsstack.unshift(curr_ns=Set.new)
|
428
|
-
attributes, closed = parse_attributes(prefixes, curr_ns)
|
458
|
+
attributes, closed = parse_attributes(@prefixes, curr_ns)
|
429
459
|
# Verify that all of the prefixes have been defined
|
430
|
-
for prefix in prefixes
|
460
|
+
for prefix in @prefixes
|
431
461
|
unless @nsstack.find{|k| k.member?(prefix)}
|
432
462
|
raise UndefinedNamespaceException.new(prefix,@source,self)
|
433
463
|
end
|
@@ -437,8 +467,12 @@ module REXML
|
|
437
467
|
@closed = tag
|
438
468
|
@nsstack.shift
|
439
469
|
else
|
470
|
+
if @tags.empty? and @have_root
|
471
|
+
raise ParseException.new("Malformed XML: Extra tag at the end of the document (got '<#{tag}')", @source)
|
472
|
+
end
|
440
473
|
@tags.push( tag )
|
441
474
|
end
|
475
|
+
@have_root = true
|
442
476
|
return [ :start_element, tag, attributes ]
|
443
477
|
end
|
444
478
|
else
|
@@ -446,6 +480,16 @@ module REXML
|
|
446
480
|
if text.chomp!("<")
|
447
481
|
@source.position -= "<".bytesize
|
448
482
|
end
|
483
|
+
if @tags.empty?
|
484
|
+
unless /\A\s*\z/.match?(text)
|
485
|
+
if @have_root
|
486
|
+
raise ParseException.new("Malformed XML: Extra content at the end of the document (got '#{text}')", @source)
|
487
|
+
else
|
488
|
+
raise ParseException.new("Malformed XML: Content at the start of the document (got '#{text}')", @source)
|
489
|
+
end
|
490
|
+
end
|
491
|
+
return pull_event if @have_root
|
492
|
+
end
|
449
493
|
return [ :text, text ]
|
450
494
|
end
|
451
495
|
rescue REXML::UndefinedNamespaceException
|
@@ -463,7 +507,9 @@ module REXML
|
|
463
507
|
def entity( reference, entities )
|
464
508
|
value = nil
|
465
509
|
value = entities[ reference ] if entities
|
466
|
-
if
|
510
|
+
if value
|
511
|
+
record_entity_expansion
|
512
|
+
else
|
467
513
|
value = DEFAULT_ENTITIES[ reference ]
|
468
514
|
value = value[2] if value
|
469
515
|
end
|
@@ -488,10 +534,14 @@ module REXML
|
|
488
534
|
|
489
535
|
# Unescapes all possible entities
|
490
536
|
def unnormalize( string, entities=nil, filter=nil )
|
491
|
-
|
537
|
+
if string.include?("\r")
|
538
|
+
rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
|
539
|
+
else
|
540
|
+
rv = string.dup
|
541
|
+
end
|
492
542
|
matches = rv.scan( REFERENCE_RE )
|
493
543
|
return rv if matches.size == 0
|
494
|
-
rv.gsub!(
|
544
|
+
rv.gsub!( Private::CHARACTER_REFERENCES ) {
|
495
545
|
m=$1
|
496
546
|
m = "0#{m}" if m[0] == ?x
|
497
547
|
[Integer(m)].pack('U*')
|
@@ -502,20 +552,31 @@ module REXML
|
|
502
552
|
unless filter and filter.include?(entity_reference)
|
503
553
|
entity_value = entity( entity_reference, entities )
|
504
554
|
if entity_value
|
505
|
-
re = /&#{entity_reference};/
|
555
|
+
re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
|
506
556
|
rv.gsub!( re, entity_value )
|
557
|
+
if rv.bytesize > Security.entity_expansion_text_limit
|
558
|
+
raise "entity expansion has grown too large"
|
559
|
+
end
|
507
560
|
else
|
508
561
|
er = DEFAULT_ENTITIES[entity_reference]
|
509
562
|
rv.gsub!( er[0], er[2] ) if er
|
510
563
|
end
|
511
564
|
end
|
512
565
|
end
|
513
|
-
rv.gsub!(
|
566
|
+
rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
|
514
567
|
end
|
515
568
|
rv
|
516
569
|
end
|
517
570
|
|
518
571
|
private
|
572
|
+
|
573
|
+
def record_entity_expansion
|
574
|
+
@entity_expansion_count += 1
|
575
|
+
if @entity_expansion_count > Security.entity_expansion_limit
|
576
|
+
raise "number of entity expansions exceeded, processing aborted."
|
577
|
+
end
|
578
|
+
end
|
579
|
+
|
519
580
|
def need_source_encoding_update?(xml_declaration_encoding)
|
520
581
|
return false if xml_declaration_encoding.nil?
|
521
582
|
return false if /\AUTF-16\z/i =~ xml_declaration_encoding
|
@@ -523,16 +584,16 @@ module REXML
|
|
523
584
|
end
|
524
585
|
|
525
586
|
def parse_name(base_error_message)
|
526
|
-
md = @source.match(NAME_PATTERN, true)
|
587
|
+
md = @source.match(Private::NAME_PATTERN, true)
|
527
588
|
unless md
|
528
|
-
if @source.match(/\
|
589
|
+
if @source.match(/\S/um)
|
529
590
|
message = "#{base_error_message}: invalid name"
|
530
591
|
else
|
531
592
|
message = "#{base_error_message}: name is missing"
|
532
593
|
end
|
533
594
|
raise REXML::ParseException.new(message, @source)
|
534
595
|
end
|
535
|
-
md[
|
596
|
+
md[0]
|
536
597
|
end
|
537
598
|
|
538
599
|
def parse_id(base_error_message,
|
@@ -601,15 +662,24 @@ module REXML
|
|
601
662
|
end
|
602
663
|
end
|
603
664
|
|
604
|
-
def process_instruction
|
605
|
-
|
606
|
-
|
607
|
-
|
608
|
-
|
609
|
-
|
665
|
+
def process_instruction
|
666
|
+
name = parse_name("Malformed XML: Invalid processing instruction node")
|
667
|
+
if @source.match(/\s+/um, true)
|
668
|
+
match_data = @source.match(/(.*?)\?>/um, true)
|
669
|
+
unless match_data
|
670
|
+
raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
|
671
|
+
end
|
672
|
+
content = match_data[1]
|
673
|
+
else
|
674
|
+
content = nil
|
675
|
+
unless @source.match("?>", true)
|
676
|
+
raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
|
677
|
+
end
|
610
678
|
end
|
611
|
-
if
|
612
|
-
|
679
|
+
if name == "xml"
|
680
|
+
if @document_status
|
681
|
+
raise ParseException.new("Malformed XML: XML declaration is not at the start", @source)
|
682
|
+
end
|
613
683
|
version = VERSION.match(content)
|
614
684
|
version = version[1] unless version.nil?
|
615
685
|
encoding = ENCODING.match(content)
|
@@ -624,7 +694,7 @@ module REXML
|
|
624
694
|
standalone = standalone[1] unless standalone.nil?
|
625
695
|
return [ :xmldecl, version, encoding, standalone ]
|
626
696
|
end
|
627
|
-
[:processing_instruction,
|
697
|
+
[:processing_instruction, name, content]
|
628
698
|
end
|
629
699
|
|
630
700
|
def parse_attributes(prefixes, curr_ns)
|
@@ -22,6 +22,10 @@ module REXML
|
|
22
22
|
@parser.source
|
23
23
|
end
|
24
24
|
|
25
|
+
def entity_expansion_count
|
26
|
+
@parser.entity_expansion_count
|
27
|
+
end
|
28
|
+
|
25
29
|
def add_listener( listener )
|
26
30
|
@parser.add_listener( listener )
|
27
31
|
end
|
@@ -157,25 +161,8 @@ module REXML
|
|
157
161
|
end
|
158
162
|
end
|
159
163
|
when :text
|
160
|
-
|
161
|
-
|
162
|
-
copy = event[1].clone
|
163
|
-
|
164
|
-
esub = proc { |match|
|
165
|
-
if @entities.has_key?($1)
|
166
|
-
@entities[$1].gsub(Text::REFERENCE, &esub)
|
167
|
-
else
|
168
|
-
match
|
169
|
-
end
|
170
|
-
}
|
171
|
-
|
172
|
-
copy.gsub!( Text::REFERENCE, &esub )
|
173
|
-
copy.gsub!( Text::NUMERICENTITY ) {|m|
|
174
|
-
m=$1
|
175
|
-
m = "0#{m}" if m[0] == ?x
|
176
|
-
[Integer(m)].pack('U*')
|
177
|
-
}
|
178
|
-
handle( :characters, copy )
|
164
|
+
unnormalized = @parser.unnormalize( event[1], @entities )
|
165
|
+
handle( :characters, unnormalized )
|
179
166
|
when :entitydecl
|
180
167
|
handle_entitydecl( event )
|
181
168
|
when :processing_instruction, :comment, :attlistdecl,
|
@@ -36,8 +36,8 @@ module REXML
|
|
36
36
|
@listener.tag_end( event[1] )
|
37
37
|
@tag_stack.pop
|
38
38
|
when :text
|
39
|
-
|
40
|
-
@listener.text(
|
39
|
+
unnormalized = @parser.unnormalize( event[1] )
|
40
|
+
@listener.text( unnormalized )
|
41
41
|
when :processing_instruction
|
42
42
|
@listener.instruction( *event[1,2] )
|
43
43
|
when :start_doctype
|
@@ -16,7 +16,6 @@ module REXML
|
|
16
16
|
|
17
17
|
def parse
|
18
18
|
tag_stack = []
|
19
|
-
in_doctype = false
|
20
19
|
entities = nil
|
21
20
|
begin
|
22
21
|
while true
|
@@ -39,17 +38,15 @@ module REXML
|
|
39
38
|
tag_stack.pop
|
40
39
|
@build_context = @build_context.parent
|
41
40
|
when :text
|
42
|
-
if
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
@build_context.
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
)
|
52
|
-
end
|
41
|
+
if @build_context[-1].instance_of? Text
|
42
|
+
@build_context[-1] << event[1]
|
43
|
+
else
|
44
|
+
@build_context.add(
|
45
|
+
Text.new(event[1], @build_context.whitespace, nil, true)
|
46
|
+
) unless (
|
47
|
+
@build_context.ignore_whitespace_nodes and
|
48
|
+
event[1].strip.size==0
|
49
|
+
)
|
53
50
|
end
|
54
51
|
when :comment
|
55
52
|
c = Comment.new( event[1] )
|
@@ -60,14 +57,12 @@ module REXML
|
|
60
57
|
when :processing_instruction
|
61
58
|
@build_context.add( Instruction.new( event[1], event[2] ) )
|
62
59
|
when :end_doctype
|
63
|
-
in_doctype = false
|
64
60
|
entities.each { |k,v| entities[k] = @build_context.entities[k].value }
|
65
61
|
@build_context = @build_context.parent
|
66
62
|
when :start_doctype
|
67
63
|
doctype = DocType.new( event[1..-1], @build_context )
|
68
64
|
@build_context = doctype
|
69
65
|
entities = {}
|
70
|
-
in_doctype = true
|
71
66
|
when :attlistdecl
|
72
67
|
n = AttlistDecl.new( event[1..-1] )
|
73
68
|
@build_context.add( n )
|
data/lib/rexml/rexml.rb
CHANGED
data/lib/rexml/source.rb
CHANGED
@@ -55,6 +55,7 @@ module REXML
|
|
55
55
|
attr_reader :encoding
|
56
56
|
|
57
57
|
module Private
|
58
|
+
SCANNER_RESET_SIZE = 100000
|
58
59
|
PRE_DEFINED_TERM_PATTERNS = {}
|
59
60
|
pre_defined_terms = ["'", '"', "<"]
|
60
61
|
pre_defined_terms.each do |term|
|
@@ -62,7 +63,6 @@ module REXML
|
|
62
63
|
end
|
63
64
|
end
|
64
65
|
private_constant :Private
|
65
|
-
include Private
|
66
66
|
|
67
67
|
# Constructor
|
68
68
|
# @param arg must be a String, and should be a valid XML document
|
@@ -84,6 +84,12 @@ module REXML
|
|
84
84
|
@scanner.rest
|
85
85
|
end
|
86
86
|
|
87
|
+
def drop_parsed_content
|
88
|
+
if @scanner.pos > Private::SCANNER_RESET_SIZE
|
89
|
+
@scanner.string = @scanner.rest
|
90
|
+
end
|
91
|
+
end
|
92
|
+
|
87
93
|
def buffer_encoding=(encoding)
|
88
94
|
@scanner.string.force_encoding(encoding)
|
89
95
|
end
|
@@ -198,10 +204,20 @@ module REXML
|
|
198
204
|
end
|
199
205
|
end
|
200
206
|
|
201
|
-
def read(term = nil)
|
207
|
+
def read(term = nil, min_bytes = 1)
|
202
208
|
term = encode(term) if term
|
203
209
|
begin
|
204
|
-
|
210
|
+
str = readline(term)
|
211
|
+
@scanner << str
|
212
|
+
read_bytes = str.bytesize
|
213
|
+
begin
|
214
|
+
while read_bytes < min_bytes
|
215
|
+
str = readline(term)
|
216
|
+
@scanner << str
|
217
|
+
read_bytes += str.bytesize
|
218
|
+
end
|
219
|
+
rescue IOError
|
220
|
+
end
|
205
221
|
true
|
206
222
|
rescue Exception, NameError
|
207
223
|
@source = nil
|
@@ -231,10 +247,9 @@ module REXML
|
|
231
247
|
read if @scanner.eos? && @source
|
232
248
|
end
|
233
249
|
|
234
|
-
# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
|
235
|
-
# - ">"
|
236
|
-
# - "XXX>" (X is any string excluding '>')
|
237
250
|
def match( pattern, cons=false )
|
251
|
+
# To avoid performance issue, we need to increase bytes to read per scan
|
252
|
+
min_bytes = 1
|
238
253
|
while true
|
239
254
|
if cons
|
240
255
|
md = @scanner.scan(pattern)
|
@@ -244,7 +259,8 @@ module REXML
|
|
244
259
|
break if md
|
245
260
|
return nil if pattern.is_a?(String)
|
246
261
|
return nil if @source.nil?
|
247
|
-
return nil unless read
|
262
|
+
return nil unless read(nil, min_bytes)
|
263
|
+
min_bytes *= 2
|
248
264
|
end
|
249
265
|
|
250
266
|
md.nil? ? nil : @scanner
|
data/lib/rexml/text.rb
CHANGED
@@ -151,25 +151,45 @@ module REXML
|
|
151
151
|
end
|
152
152
|
end
|
153
153
|
|
154
|
-
|
155
|
-
string.
|
156
|
-
if
|
157
|
-
raise "Illegal character #{
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
154
|
+
pos = 0
|
155
|
+
while (index = string.index(/<|&/, pos))
|
156
|
+
if string[index] == "<"
|
157
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
158
|
+
end
|
159
|
+
|
160
|
+
unless (end_index = string.index(/[^\s];/, index + 1))
|
161
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
162
|
+
end
|
163
|
+
|
164
|
+
value = string[(index + 1)..end_index]
|
165
|
+
if /\s/.match?(value)
|
166
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
167
|
+
end
|
168
|
+
|
169
|
+
if value[0] == "#"
|
170
|
+
character_reference = value[1..-1]
|
171
|
+
|
172
|
+
unless (/\A(\d+|x[0-9a-fA-F]+)\z/.match?(character_reference))
|
173
|
+
if character_reference[0] == "x" || character_reference[-1] == "x"
|
174
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
162
175
|
else
|
163
|
-
raise "Illegal character #{
|
176
|
+
raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
|
164
177
|
end
|
165
|
-
# FIXME: below can't work but this needs API change.
|
166
|
-
# elsif @parent and $3 and !SUBSTITUTES.include?($1)
|
167
|
-
# if !doctype or !doctype.entities.has_key?($3)
|
168
|
-
# raise "Undeclared entity '#{$1}' in raw string \"#{string}\""
|
169
|
-
# end
|
170
178
|
end
|
179
|
+
|
180
|
+
case (character_reference[0] == "x" ? character_reference[1..-1].to_i(16) : character_reference[0..-1].to_i)
|
181
|
+
when *VALID_CHAR
|
182
|
+
else
|
183
|
+
raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
|
184
|
+
end
|
185
|
+
elsif !(/\A#{Entity::NAME}\z/um.match?(value))
|
186
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
171
187
|
end
|
188
|
+
|
189
|
+
pos = end_index + 1
|
172
190
|
end
|
191
|
+
|
192
|
+
string
|
173
193
|
end
|
174
194
|
|
175
195
|
def node_type
|
metadata
CHANGED
@@ -1,14 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rexml
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.3.
|
4
|
+
version: 3.3.5
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Kouhei Sutou
|
8
|
-
autorequire:
|
9
8
|
bindir: bin
|
10
9
|
cert_chain: []
|
11
|
-
date: 2024-
|
10
|
+
date: 2024-08-12 00:00:00.000000000 Z
|
12
11
|
dependencies:
|
13
12
|
- !ruby/object:Gem::Dependency
|
14
13
|
name: strscan
|
@@ -116,8 +115,8 @@ files:
|
|
116
115
|
homepage: https://github.com/ruby/rexml
|
117
116
|
licenses:
|
118
117
|
- BSD-2-Clause
|
119
|
-
metadata:
|
120
|
-
|
118
|
+
metadata:
|
119
|
+
changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.5
|
121
120
|
rdoc_options:
|
122
121
|
- "--main"
|
123
122
|
- README.md
|
@@ -134,8 +133,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
134
133
|
- !ruby/object:Gem::Version
|
135
134
|
version: '0'
|
136
135
|
requirements: []
|
137
|
-
rubygems_version: 3.
|
138
|
-
signing_key:
|
136
|
+
rubygems_version: 3.6.0.dev
|
139
137
|
specification_version: 4
|
140
138
|
summary: An XML toolkit for Ruby
|
141
139
|
test_files: []
|