rexml 3.2.9 → 3.3.4
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of rexml might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/NEWS.md +160 -2
- data/lib/rexml/element.rb +2 -15
- data/lib/rexml/formatters/pretty.rb +1 -1
- data/lib/rexml/parsers/baseparser.rb +109 -37
- data/lib/rexml/parsers/pullparser.rb +4 -0
- data/lib/rexml/parsers/sax2parser.rb +6 -19
- data/lib/rexml/parsers/streamparser.rb +2 -2
- data/lib/rexml/parsers/treeparser.rb +9 -14
- data/lib/rexml/rexml.rb +1 -1
- data/lib/rexml/source.rb +43 -7
- data/lib/rexml/text.rb +34 -14
- metadata +4 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e47ba1209ca1ca2ae0584348378fcefe05de5dc277273d434a37d62e04c676b3
|
4
|
+
data.tar.gz: 867f9e01423f83063aac7c59e07670c88c20f527f676e28cdf9d098248293c56
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d87d9cd9384218f3a9bd65870cef99e057022c83bae434318daeab781444378ea830ce46ae20879954f2ae54a7a00cc54eac2839784b989612315ddef909c809
|
7
|
+
data.tar.gz: 1e61927c65b9a058626d0ab19c7f5af0d49169d896e76402e0152476cc772dabf41b8f7a135040b12f5c46eac933de8e60d21fdea8388ed7342be8cc6f9114e9
|
data/NEWS.md
CHANGED
@@ -1,12 +1,170 @@
|
|
1
1
|
# News
|
2
2
|
|
3
|
-
## 3.
|
3
|
+
## 3.3.4 - 2024-08-01 {#version-3-3-4}
|
4
|
+
|
5
|
+
### Fixes
|
6
|
+
|
7
|
+
* Fixed a bug that `REXML::Security` isn't defined when
|
8
|
+
`REXML::Parsers::StreamParser` is used and
|
9
|
+
`rexml/parsers/streamparser` is only required.
|
10
|
+
* GH-189
|
11
|
+
* Patch by takuya kodama.
|
12
|
+
|
13
|
+
### Thanks
|
14
|
+
|
15
|
+
* takuya kodama
|
16
|
+
|
17
|
+
## 3.3.3 - 2024-08-01 {#version-3-3-3}
|
18
|
+
|
19
|
+
### Improvements
|
20
|
+
|
21
|
+
* Added support for detecting invalid XML that has unsupported
|
22
|
+
content before root element
|
23
|
+
* GH-184
|
24
|
+
* Patch by NAITOH Jun.
|
25
|
+
|
26
|
+
* Added support for `REXML::Security.entity_expansion_limit=` and
|
27
|
+
`REXML::Security.entity_expansion_text_limit=` in SAX2 and pull
|
28
|
+
parsers
|
29
|
+
* GH-187
|
30
|
+
* Patch by NAITOH Jun.
|
31
|
+
|
32
|
+
* Added more tests for invalid XMLs.
|
33
|
+
* GH-183
|
34
|
+
* Patch by Watson.
|
35
|
+
|
36
|
+
* Added more performance tests.
|
37
|
+
* Patch by Watson.
|
38
|
+
|
39
|
+
* Improved parse performance.
|
40
|
+
* GH-186
|
41
|
+
* Patch by tomoya ishida.
|
42
|
+
|
43
|
+
### Thanks
|
44
|
+
|
45
|
+
* NAITOH Jun
|
46
|
+
|
47
|
+
* Watson
|
48
|
+
|
49
|
+
* tomoya ishida
|
50
|
+
|
51
|
+
## 3.3.2 - 2024-07-16 {#version-3-3-2}
|
52
|
+
|
53
|
+
### Improvements
|
54
|
+
|
55
|
+
* Improved parse performance.
|
56
|
+
* GH-160
|
57
|
+
* Patch by NAITOH Jun.
|
58
|
+
|
59
|
+
* Improved parse performance.
|
60
|
+
* GH-169
|
61
|
+
* GH-170
|
62
|
+
* GH-171
|
63
|
+
* GH-172
|
64
|
+
* GH-173
|
65
|
+
* GH-174
|
66
|
+
* GH-175
|
67
|
+
* GH-176
|
68
|
+
* GH-177
|
69
|
+
* Patch by Watson.
|
70
|
+
|
71
|
+
* Added support for raising a parse exception when an XML has extra
|
72
|
+
content after the root element.
|
73
|
+
* GH-161
|
74
|
+
* Patch by NAITOH Jun.
|
75
|
+
|
76
|
+
* Added support for raising a parse exception when an XML
|
77
|
+
declaration exists in wrong position.
|
78
|
+
* GH-162
|
79
|
+
* Patch by NAITOH Jun.
|
80
|
+
|
81
|
+
* Removed needless a space after XML declaration in pretty print mode.
|
82
|
+
* GH-164
|
83
|
+
* Patch by NAITOH Jun.
|
84
|
+
|
85
|
+
* Stopped to emit `:text` event after the root element.
|
86
|
+
* GH-167
|
87
|
+
* Patch by NAITOH Jun.
|
88
|
+
|
89
|
+
### Fixes
|
90
|
+
|
91
|
+
* Fixed a bug that SAX2 parser doesn't expand predefined entities for
|
92
|
+
`characters` callback.
|
93
|
+
* GH-168
|
94
|
+
* Patch by NAITOH Jun.
|
95
|
+
|
96
|
+
### Thanks
|
97
|
+
|
98
|
+
* NAITOH Jun
|
99
|
+
|
100
|
+
* Watson
|
101
|
+
|
102
|
+
## 3.3.1 - 2024-06-25 {#version-3-3-1}
|
103
|
+
|
104
|
+
### Improvements
|
105
|
+
|
106
|
+
* Added support for detecting malformed top-level comments.
|
107
|
+
* GH-145
|
108
|
+
* Patch by Hiroya Fujinami.
|
109
|
+
|
110
|
+
* Improved `REXML::Element#attribute` performance.
|
111
|
+
* GH-146
|
112
|
+
* Patch by Hiroya Fujinami.
|
113
|
+
|
114
|
+
* Added support for detecting malformed `<!-->` comments.
|
115
|
+
* GH-147
|
116
|
+
* Patch by Hiroya Fujinami.
|
117
|
+
|
118
|
+
* Added support for detecting unclosed `DOCTYPE`.
|
119
|
+
* GH-152
|
120
|
+
* Patch by Hiroya Fujinami.
|
121
|
+
|
122
|
+
* Added `changlog_uri` metadata to gemspec.
|
123
|
+
* GH-156
|
124
|
+
* Patch by fynsta.
|
125
|
+
|
126
|
+
* Improved parse performance.
|
127
|
+
* GH-157
|
128
|
+
* GH-158
|
129
|
+
* Patch by NAITOH Jun.
|
130
|
+
|
131
|
+
### Fixes
|
132
|
+
|
133
|
+
* Fixed a bug that large XML can't be parsed.
|
134
|
+
* GH-154
|
135
|
+
* Patch by NAITOH Jun.
|
136
|
+
|
137
|
+
* Fixed a bug that private constants are visible.
|
138
|
+
* GH-155
|
139
|
+
* Patch by NAITOH Jun.
|
140
|
+
|
141
|
+
### Thanks
|
142
|
+
|
143
|
+
* Hiroya Fujinami
|
144
|
+
|
145
|
+
* NAITOH Jun
|
146
|
+
|
147
|
+
* fynsta
|
148
|
+
|
149
|
+
## 3.3.0 - 2024-06-11 {#version-3-3-0}
|
150
|
+
|
151
|
+
### Improvements
|
152
|
+
|
153
|
+
* Added support for strscan 0.7.0 installed with Ruby 2.6.
|
154
|
+
* GH-142
|
155
|
+
* Reported by Fernando Trigoso.
|
156
|
+
|
157
|
+
### Thanks
|
158
|
+
|
159
|
+
* Fernando Trigoso
|
160
|
+
|
161
|
+
## 3.2.9 - 2024-06-09 {#version-3-2-9}
|
4
162
|
|
5
163
|
### Improvements
|
6
164
|
|
7
165
|
* Added support for old strscan.
|
8
166
|
* GH-132
|
9
|
-
* Reported by Adam
|
167
|
+
* Reported by Adam.
|
10
168
|
|
11
169
|
* Improved attribute value parse performance.
|
12
170
|
* GH-135
|
data/lib/rexml/element.rb
CHANGED
@@ -7,14 +7,6 @@ require_relative "xpath"
|
|
7
7
|
require_relative "parseexception"
|
8
8
|
|
9
9
|
module REXML
|
10
|
-
# An implementation note about namespaces:
|
11
|
-
# As we parse, when we find namespaces we put them in a hash and assign
|
12
|
-
# them a unique ID. We then convert the namespace prefix for the node
|
13
|
-
# to the unique ID. This makes namespace lookup much faster for the
|
14
|
-
# cost of extra memory use. We save the namespace prefix for the
|
15
|
-
# context node and convert it back when we write it.
|
16
|
-
@@namespaces = {}
|
17
|
-
|
18
10
|
# An \REXML::Element object represents an XML element.
|
19
11
|
#
|
20
12
|
# An element:
|
@@ -1284,16 +1276,11 @@ module REXML
|
|
1284
1276
|
# document.root.attribute("x", "a") # => a:x='a:x'
|
1285
1277
|
#
|
1286
1278
|
def attribute( name, namespace=nil )
|
1287
|
-
prefix =
|
1288
|
-
if namespaces.respond_to? :key
|
1289
|
-
prefix = namespaces.key(namespace) if namespace
|
1290
|
-
else
|
1291
|
-
prefix = namespaces.index(namespace) if namespace
|
1292
|
-
end
|
1279
|
+
prefix = namespaces.key(namespace) if namespace
|
1293
1280
|
prefix = nil if prefix == 'xmlns'
|
1294
1281
|
|
1295
1282
|
ret_val =
|
1296
|
-
attributes.get_attribute(
|
1283
|
+
attributes.get_attribute( prefix ? "#{prefix}:#{name}" : name )
|
1297
1284
|
|
1298
1285
|
return ret_val unless ret_val.nil?
|
1299
1286
|
return nil if prefix.nil?
|
@@ -111,7 +111,7 @@ module REXML
|
|
111
111
|
# itself, then we don't need a carriage return... which makes this
|
112
112
|
# logic more complex.
|
113
113
|
node.children.each { |child|
|
114
|
-
next if child
|
114
|
+
next if child.instance_of?(Text)
|
115
115
|
unless child == node.children[0] or child.instance_of?(Text) or
|
116
116
|
(child == node.children[1] and !node.children[0].writethis)
|
117
117
|
output << "\n"
|
@@ -1,6 +1,7 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
require_relative '../parseexception'
|
3
3
|
require_relative '../undefinednamespaceexception'
|
4
|
+
require_relative '../security'
|
4
5
|
require_relative '../source'
|
5
6
|
require 'set'
|
6
7
|
require "strscan"
|
@@ -124,21 +125,28 @@ module REXML
|
|
124
125
|
}
|
125
126
|
|
126
127
|
module Private
|
127
|
-
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
|
128
128
|
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
|
129
129
|
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
|
130
130
|
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
|
131
|
-
NAME_PATTERN =
|
131
|
+
NAME_PATTERN = /#{NAME}/um
|
132
132
|
GEDECL_PATTERN = "\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
|
133
133
|
PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
|
134
134
|
ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
|
135
|
+
CARRIAGE_RETURN_NEWLINE_PATTERN = /\r\n?/
|
136
|
+
CHARACTER_REFERENCES = /�*((?:\d+)|(?:x[a-fA-F0-9]+));/
|
137
|
+
DEFAULT_ENTITIES_PATTERNS = {}
|
138
|
+
default_entities = ['gt', 'lt', 'quot', 'apos', 'amp']
|
139
|
+
default_entities.each do |term|
|
140
|
+
DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
|
141
|
+
end
|
135
142
|
end
|
136
143
|
private_constant :Private
|
137
|
-
include Private
|
138
144
|
|
139
145
|
def initialize( source )
|
140
146
|
self.stream = source
|
141
147
|
@listeners = []
|
148
|
+
@prefixes = Set.new
|
149
|
+
@entity_expansion_count = 0
|
142
150
|
end
|
143
151
|
|
144
152
|
def add_listener( listener )
|
@@ -146,10 +154,12 @@ module REXML
|
|
146
154
|
end
|
147
155
|
|
148
156
|
attr_reader :source
|
157
|
+
attr_reader :entity_expansion_count
|
149
158
|
|
150
159
|
def stream=( source )
|
151
160
|
@source = SourceFactory.create_from( source )
|
152
161
|
@closed = nil
|
162
|
+
@have_root = false
|
153
163
|
@document_status = nil
|
154
164
|
@tags = []
|
155
165
|
@stack = []
|
@@ -204,6 +214,8 @@ module REXML
|
|
204
214
|
|
205
215
|
# Returns the next event. This is a +PullEvent+ object.
|
206
216
|
def pull
|
217
|
+
@source.drop_parsed_content
|
218
|
+
|
207
219
|
pull_event.tap do |event|
|
208
220
|
@listeners.each do |listener|
|
209
221
|
listener.receive event
|
@@ -216,7 +228,12 @@ module REXML
|
|
216
228
|
x, @closed = @closed, nil
|
217
229
|
return [ :end_element, x ]
|
218
230
|
end
|
219
|
-
|
231
|
+
if empty?
|
232
|
+
if @document_status == :in_doctype
|
233
|
+
raise ParseException.new("Malformed DOCTYPE: unclosed", @source)
|
234
|
+
end
|
235
|
+
return [ :end_document ]
|
236
|
+
end
|
220
237
|
return @stack.shift if @stack.size > 0
|
221
238
|
#STDERR.puts @source.encoding
|
222
239
|
#STDERR.puts "BUFFER = #{@source.buffer.inspect}"
|
@@ -225,10 +242,17 @@ module REXML
|
|
225
242
|
if @document_status == nil
|
226
243
|
start_position = @source.position
|
227
244
|
if @source.match("<?", true)
|
228
|
-
return process_instruction
|
245
|
+
return process_instruction
|
229
246
|
elsif @source.match("<!", true)
|
230
247
|
if @source.match("--", true)
|
231
|
-
|
248
|
+
md = @source.match(/(.*?)-->/um, true)
|
249
|
+
if md.nil?
|
250
|
+
raise REXML::ParseException.new("Unclosed comment", @source)
|
251
|
+
end
|
252
|
+
if /--|-\z/.match?(md[1])
|
253
|
+
raise REXML::ParseException.new("Malformed comment", @source)
|
254
|
+
end
|
255
|
+
return [ :comment, md[1] ]
|
232
256
|
elsif @source.match("DOCTYPE", true)
|
233
257
|
base_error_message = "Malformed DOCTYPE"
|
234
258
|
unless @source.match(/\s+/um, true)
|
@@ -240,7 +264,7 @@ module REXML
|
|
240
264
|
@source.position = start_position
|
241
265
|
raise REXML::ParseException.new(message, @source)
|
242
266
|
end
|
243
|
-
@nsstack.unshift(
|
267
|
+
@nsstack.unshift(Set.new)
|
244
268
|
name = parse_name(base_error_message)
|
245
269
|
if @source.match(/\s*\[/um, true)
|
246
270
|
id = [nil, nil, nil]
|
@@ -288,7 +312,11 @@ module REXML
|
|
288
312
|
raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
|
289
313
|
return [ :elementdecl, "<!ELEMENT" + md[1] ]
|
290
314
|
elsif @source.match("ENTITY", true)
|
291
|
-
|
315
|
+
match_data = @source.match(Private::ENTITYDECL_PATTERN, true)
|
316
|
+
unless match_data
|
317
|
+
raise REXML::ParseException.new("Malformed entity declaration", @source)
|
318
|
+
end
|
319
|
+
match = [:entitydecl, *match_data.captures.compact]
|
292
320
|
ref = false
|
293
321
|
if match[1] == '%'
|
294
322
|
ref = true
|
@@ -314,13 +342,13 @@ module REXML
|
|
314
342
|
match << '%' if ref
|
315
343
|
return match
|
316
344
|
elsif @source.match("ATTLIST", true)
|
317
|
-
md = @source.match(ATTLISTDECL_END, true)
|
345
|
+
md = @source.match(Private::ATTLISTDECL_END, true)
|
318
346
|
raise REXML::ParseException.new( "Bad ATTLIST declaration!", @source ) if md.nil?
|
319
347
|
element = md[1]
|
320
348
|
contents = md[0]
|
321
349
|
|
322
350
|
pairs = {}
|
323
|
-
values = md[0].scan( ATTDEF_RE )
|
351
|
+
values = md[0].strip.scan( ATTDEF_RE )
|
324
352
|
values.each do |attdef|
|
325
353
|
unless attdef[3] == "#IMPLIED"
|
326
354
|
attdef.compact!
|
@@ -366,6 +394,9 @@ module REXML
|
|
366
394
|
@document_status = :after_doctype
|
367
395
|
return [ :end_doctype ]
|
368
396
|
end
|
397
|
+
if @document_status == :in_doctype
|
398
|
+
raise ParseException.new("Malformed DOCTYPE: invalid declaration", @source)
|
399
|
+
end
|
369
400
|
end
|
370
401
|
if @document_status == :after_doctype
|
371
402
|
@source.match(/\s*/um, true)
|
@@ -380,7 +411,7 @@ module REXML
|
|
380
411
|
if @source.match("/", true)
|
381
412
|
@nsstack.shift
|
382
413
|
last_tag = @tags.pop
|
383
|
-
md = @source.match(CLOSE_PATTERN, true)
|
414
|
+
md = @source.match(Private::CLOSE_PATTERN, true)
|
384
415
|
if md and !last_tag
|
385
416
|
message = "Unexpected top-level end tag (got '#{md[1]}')"
|
386
417
|
raise REXML::ParseException.new(message, @source)
|
@@ -399,12 +430,11 @@ module REXML
|
|
399
430
|
if md[0][0] == ?-
|
400
431
|
md = @source.match(/--(.*?)-->/um, true)
|
401
432
|
|
402
|
-
|
403
|
-
when /--/, /-\z/
|
433
|
+
if md.nil? || /--|-\z/.match?(md[1])
|
404
434
|
raise REXML::ParseException.new("Malformed comment", @source)
|
405
435
|
end
|
406
436
|
|
407
|
-
return [ :comment, md[1] ]
|
437
|
+
return [ :comment, md[1] ]
|
408
438
|
else
|
409
439
|
md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
|
410
440
|
return [ :cdata, md[1] ] if md
|
@@ -412,22 +442,22 @@ module REXML
|
|
412
442
|
raise REXML::ParseException.new( "Declarations can only occur "+
|
413
443
|
"in the doctype declaration.", @source)
|
414
444
|
elsif @source.match("?", true)
|
415
|
-
return process_instruction
|
445
|
+
return process_instruction
|
416
446
|
else
|
417
447
|
# Get the next tag
|
418
|
-
md = @source.match(TAG_PATTERN, true)
|
448
|
+
md = @source.match(Private::TAG_PATTERN, true)
|
419
449
|
unless md
|
420
450
|
@source.position = start_position
|
421
451
|
raise REXML::ParseException.new("malformed XML: missing tag start", @source)
|
422
452
|
end
|
423
453
|
tag = md[1]
|
424
454
|
@document_status = :in_element
|
425
|
-
prefixes
|
426
|
-
prefixes << md[2] if md[2]
|
455
|
+
@prefixes.clear
|
456
|
+
@prefixes << md[2] if md[2]
|
427
457
|
@nsstack.unshift(curr_ns=Set.new)
|
428
|
-
attributes, closed = parse_attributes(prefixes, curr_ns)
|
458
|
+
attributes, closed = parse_attributes(@prefixes, curr_ns)
|
429
459
|
# Verify that all of the prefixes have been defined
|
430
|
-
for prefix in prefixes
|
460
|
+
for prefix in @prefixes
|
431
461
|
unless @nsstack.find{|k| k.member?(prefix)}
|
432
462
|
raise UndefinedNamespaceException.new(prefix,@source,self)
|
433
463
|
end
|
@@ -437,8 +467,12 @@ module REXML
|
|
437
467
|
@closed = tag
|
438
468
|
@nsstack.shift
|
439
469
|
else
|
470
|
+
if @tags.empty? and @have_root
|
471
|
+
raise ParseException.new("Malformed XML: Extra tag at the end of the document (got '<#{tag}')", @source)
|
472
|
+
end
|
440
473
|
@tags.push( tag )
|
441
474
|
end
|
475
|
+
@have_root = true
|
442
476
|
return [ :start_element, tag, attributes ]
|
443
477
|
end
|
444
478
|
else
|
@@ -446,6 +480,16 @@ module REXML
|
|
446
480
|
if text.chomp!("<")
|
447
481
|
@source.position -= "<".bytesize
|
448
482
|
end
|
483
|
+
if @tags.empty?
|
484
|
+
unless /\A\s*\z/.match?(text)
|
485
|
+
if @have_root
|
486
|
+
raise ParseException.new("Malformed XML: Extra content at the end of the document (got '#{text}')", @source)
|
487
|
+
else
|
488
|
+
raise ParseException.new("Malformed XML: Content at the start of the document (got '#{text}')", @source)
|
489
|
+
end
|
490
|
+
end
|
491
|
+
return pull_event if @have_root
|
492
|
+
end
|
449
493
|
return [ :text, text ]
|
450
494
|
end
|
451
495
|
rescue REXML::UndefinedNamespaceException
|
@@ -463,7 +507,9 @@ module REXML
|
|
463
507
|
def entity( reference, entities )
|
464
508
|
value = nil
|
465
509
|
value = entities[ reference ] if entities
|
466
|
-
if
|
510
|
+
if value
|
511
|
+
record_entity_expansion
|
512
|
+
else
|
467
513
|
value = DEFAULT_ENTITIES[ reference ]
|
468
514
|
value = value[2] if value
|
469
515
|
end
|
@@ -488,34 +534,51 @@ module REXML
|
|
488
534
|
|
489
535
|
# Unescapes all possible entities
|
490
536
|
def unnormalize( string, entities=nil, filter=nil )
|
491
|
-
|
537
|
+
if string.include?("\r")
|
538
|
+
rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
|
539
|
+
else
|
540
|
+
rv = string.dup
|
541
|
+
end
|
492
542
|
matches = rv.scan( REFERENCE_RE )
|
493
543
|
return rv if matches.size == 0
|
494
|
-
rv.gsub!(
|
544
|
+
rv.gsub!( Private::CHARACTER_REFERENCES ) {
|
495
545
|
m=$1
|
496
546
|
m = "0#{m}" if m[0] == ?x
|
497
547
|
[Integer(m)].pack('U*')
|
498
548
|
}
|
499
549
|
matches.collect!{|x|x[0]}.compact!
|
500
550
|
if matches.size > 0
|
551
|
+
sum = 0
|
501
552
|
matches.each do |entity_reference|
|
502
553
|
unless filter and filter.include?(entity_reference)
|
503
554
|
entity_value = entity( entity_reference, entities )
|
504
555
|
if entity_value
|
505
|
-
re = /&#{entity_reference};/
|
556
|
+
re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
|
506
557
|
rv.gsub!( re, entity_value )
|
558
|
+
sum += rv.bytesize
|
559
|
+
if sum > Security.entity_expansion_text_limit
|
560
|
+
raise "entity expansion has grown too large"
|
561
|
+
end
|
507
562
|
else
|
508
563
|
er = DEFAULT_ENTITIES[entity_reference]
|
509
564
|
rv.gsub!( er[0], er[2] ) if er
|
510
565
|
end
|
511
566
|
end
|
512
567
|
end
|
513
|
-
rv.gsub!(
|
568
|
+
rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
|
514
569
|
end
|
515
570
|
rv
|
516
571
|
end
|
517
572
|
|
518
573
|
private
|
574
|
+
|
575
|
+
def record_entity_expansion
|
576
|
+
@entity_expansion_count += 1
|
577
|
+
if @entity_expansion_count > Security.entity_expansion_limit
|
578
|
+
raise "number of entity expansions exceeded, processing aborted."
|
579
|
+
end
|
580
|
+
end
|
581
|
+
|
519
582
|
def need_source_encoding_update?(xml_declaration_encoding)
|
520
583
|
return false if xml_declaration_encoding.nil?
|
521
584
|
return false if /\AUTF-16\z/i =~ xml_declaration_encoding
|
@@ -523,16 +586,16 @@ module REXML
|
|
523
586
|
end
|
524
587
|
|
525
588
|
def parse_name(base_error_message)
|
526
|
-
md = @source.match(NAME_PATTERN, true)
|
589
|
+
md = @source.match(Private::NAME_PATTERN, true)
|
527
590
|
unless md
|
528
|
-
if @source.match(/\
|
591
|
+
if @source.match(/\S/um)
|
529
592
|
message = "#{base_error_message}: invalid name"
|
530
593
|
else
|
531
594
|
message = "#{base_error_message}: name is missing"
|
532
595
|
end
|
533
596
|
raise REXML::ParseException.new(message, @source)
|
534
597
|
end
|
535
|
-
md[
|
598
|
+
md[0]
|
536
599
|
end
|
537
600
|
|
538
601
|
def parse_id(base_error_message,
|
@@ -601,15 +664,24 @@ module REXML
|
|
601
664
|
end
|
602
665
|
end
|
603
666
|
|
604
|
-
def process_instruction
|
605
|
-
|
606
|
-
|
607
|
-
|
608
|
-
|
609
|
-
|
667
|
+
def process_instruction
|
668
|
+
name = parse_name("Malformed XML: Invalid processing instruction node")
|
669
|
+
if @source.match(/\s+/um, true)
|
670
|
+
match_data = @source.match(/(.*?)\?>/um, true)
|
671
|
+
unless match_data
|
672
|
+
raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
|
673
|
+
end
|
674
|
+
content = match_data[1]
|
675
|
+
else
|
676
|
+
content = nil
|
677
|
+
unless @source.match("?>", true)
|
678
|
+
raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
|
679
|
+
end
|
610
680
|
end
|
611
|
-
if
|
612
|
-
|
681
|
+
if name == "xml"
|
682
|
+
if @document_status
|
683
|
+
raise ParseException.new("Malformed XML: XML declaration is not at the start", @source)
|
684
|
+
end
|
613
685
|
version = VERSION.match(content)
|
614
686
|
version = version[1] unless version.nil?
|
615
687
|
encoding = ENCODING.match(content)
|
@@ -624,7 +696,7 @@ module REXML
|
|
624
696
|
standalone = standalone[1] unless standalone.nil?
|
625
697
|
return [ :xmldecl, version, encoding, standalone ]
|
626
698
|
end
|
627
|
-
[:processing_instruction,
|
699
|
+
[:processing_instruction, name, content]
|
628
700
|
end
|
629
701
|
|
630
702
|
def parse_attributes(prefixes, curr_ns)
|
@@ -22,6 +22,10 @@ module REXML
|
|
22
22
|
@parser.source
|
23
23
|
end
|
24
24
|
|
25
|
+
def entity_expansion_count
|
26
|
+
@parser.entity_expansion_count
|
27
|
+
end
|
28
|
+
|
25
29
|
def add_listener( listener )
|
26
30
|
@parser.add_listener( listener )
|
27
31
|
end
|
@@ -157,25 +161,8 @@ module REXML
|
|
157
161
|
end
|
158
162
|
end
|
159
163
|
when :text
|
160
|
-
|
161
|
-
|
162
|
-
copy = event[1].clone
|
163
|
-
|
164
|
-
esub = proc { |match|
|
165
|
-
if @entities.has_key?($1)
|
166
|
-
@entities[$1].gsub(Text::REFERENCE, &esub)
|
167
|
-
else
|
168
|
-
match
|
169
|
-
end
|
170
|
-
}
|
171
|
-
|
172
|
-
copy.gsub!( Text::REFERENCE, &esub )
|
173
|
-
copy.gsub!( Text::NUMERICENTITY ) {|m|
|
174
|
-
m=$1
|
175
|
-
m = "0#{m}" if m[0] == ?x
|
176
|
-
[Integer(m)].pack('U*')
|
177
|
-
}
|
178
|
-
handle( :characters, copy )
|
164
|
+
unnormalized = @parser.unnormalize( event[1], @entities )
|
165
|
+
handle( :characters, unnormalized )
|
179
166
|
when :entitydecl
|
180
167
|
handle_entitydecl( event )
|
181
168
|
when :processing_instruction, :comment, :attlistdecl,
|
@@ -36,8 +36,8 @@ module REXML
|
|
36
36
|
@listener.tag_end( event[1] )
|
37
37
|
@tag_stack.pop
|
38
38
|
when :text
|
39
|
-
|
40
|
-
@listener.text(
|
39
|
+
unnormalized = @parser.unnormalize( event[1] )
|
40
|
+
@listener.text( unnormalized )
|
41
41
|
when :processing_instruction
|
42
42
|
@listener.instruction( *event[1,2] )
|
43
43
|
when :start_doctype
|
@@ -16,7 +16,6 @@ module REXML
|
|
16
16
|
|
17
17
|
def parse
|
18
18
|
tag_stack = []
|
19
|
-
in_doctype = false
|
20
19
|
entities = nil
|
21
20
|
begin
|
22
21
|
while true
|
@@ -39,17 +38,15 @@ module REXML
|
|
39
38
|
tag_stack.pop
|
40
39
|
@build_context = @build_context.parent
|
41
40
|
when :text
|
42
|
-
if
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
@build_context.
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
)
|
52
|
-
end
|
41
|
+
if @build_context[-1].instance_of? Text
|
42
|
+
@build_context[-1] << event[1]
|
43
|
+
else
|
44
|
+
@build_context.add(
|
45
|
+
Text.new(event[1], @build_context.whitespace, nil, true)
|
46
|
+
) unless (
|
47
|
+
@build_context.ignore_whitespace_nodes and
|
48
|
+
event[1].strip.size==0
|
49
|
+
)
|
53
50
|
end
|
54
51
|
when :comment
|
55
52
|
c = Comment.new( event[1] )
|
@@ -60,14 +57,12 @@ module REXML
|
|
60
57
|
when :processing_instruction
|
61
58
|
@build_context.add( Instruction.new( event[1], event[2] ) )
|
62
59
|
when :end_doctype
|
63
|
-
in_doctype = false
|
64
60
|
entities.each { |k,v| entities[k] = @build_context.entities[k].value }
|
65
61
|
@build_context = @build_context.parent
|
66
62
|
when :start_doctype
|
67
63
|
doctype = DocType.new( event[1..-1], @build_context )
|
68
64
|
@build_context = doctype
|
69
65
|
entities = {}
|
70
|
-
in_doctype = true
|
71
66
|
when :attlistdecl
|
72
67
|
n = AttlistDecl.new( event[1..-1] )
|
73
68
|
@build_context.add( n )
|
data/lib/rexml/rexml.rb
CHANGED
data/lib/rexml/source.rb
CHANGED
@@ -1,8 +1,28 @@
|
|
1
1
|
# coding: US-ASCII
|
2
2
|
# frozen_string_literal: false
|
3
|
+
|
4
|
+
require "strscan"
|
5
|
+
|
3
6
|
require_relative 'encoding'
|
4
7
|
|
5
8
|
module REXML
|
9
|
+
if StringScanner::Version < "1.0.0"
|
10
|
+
module StringScannerCheckScanString
|
11
|
+
refine StringScanner do
|
12
|
+
def check(pattern)
|
13
|
+
pattern = /#{Regexp.escape(pattern)}/ if pattern.is_a?(String)
|
14
|
+
super(pattern)
|
15
|
+
end
|
16
|
+
|
17
|
+
def scan(pattern)
|
18
|
+
pattern = /#{Regexp.escape(pattern)}/ if pattern.is_a?(String)
|
19
|
+
super(pattern)
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
using StringScannerCheckScanString
|
24
|
+
end
|
25
|
+
|
6
26
|
# Generates Source-s. USE THIS CLASS.
|
7
27
|
class SourceFactory
|
8
28
|
# Generates a Source object
|
@@ -35,6 +55,7 @@ module REXML
|
|
35
55
|
attr_reader :encoding
|
36
56
|
|
37
57
|
module Private
|
58
|
+
SCANNER_RESET_SIZE = 100000
|
38
59
|
PRE_DEFINED_TERM_PATTERNS = {}
|
39
60
|
pre_defined_terms = ["'", '"', "<"]
|
40
61
|
pre_defined_terms.each do |term|
|
@@ -42,7 +63,6 @@ module REXML
|
|
42
63
|
end
|
43
64
|
end
|
44
65
|
private_constant :Private
|
45
|
-
include Private
|
46
66
|
|
47
67
|
# Constructor
|
48
68
|
# @param arg must be a String, and should be a valid XML document
|
@@ -64,6 +84,12 @@ module REXML
|
|
64
84
|
@scanner.rest
|
65
85
|
end
|
66
86
|
|
87
|
+
def drop_parsed_content
|
88
|
+
if @scanner.pos > Private::SCANNER_RESET_SIZE
|
89
|
+
@scanner.string = @scanner.rest
|
90
|
+
end
|
91
|
+
end
|
92
|
+
|
67
93
|
def buffer_encoding=(encoding)
|
68
94
|
@scanner.string.force_encoding(encoding)
|
69
95
|
end
|
@@ -178,10 +204,20 @@ module REXML
|
|
178
204
|
end
|
179
205
|
end
|
180
206
|
|
181
|
-
def read(term = nil)
|
207
|
+
def read(term = nil, min_bytes = 1)
|
182
208
|
term = encode(term) if term
|
183
209
|
begin
|
184
|
-
|
210
|
+
str = readline(term)
|
211
|
+
@scanner << str
|
212
|
+
read_bytes = str.bytesize
|
213
|
+
begin
|
214
|
+
while read_bytes < min_bytes
|
215
|
+
str = readline(term)
|
216
|
+
@scanner << str
|
217
|
+
read_bytes += str.bytesize
|
218
|
+
end
|
219
|
+
rescue IOError
|
220
|
+
end
|
185
221
|
true
|
186
222
|
rescue Exception, NameError
|
187
223
|
@source = nil
|
@@ -211,10 +247,9 @@ module REXML
|
|
211
247
|
read if @scanner.eos? && @source
|
212
248
|
end
|
213
249
|
|
214
|
-
# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
|
215
|
-
# - ">"
|
216
|
-
# - "XXX>" (X is any string excluding '>')
|
217
250
|
def match( pattern, cons=false )
|
251
|
+
# To avoid performance issue, we need to increase bytes to read per scan
|
252
|
+
min_bytes = 1
|
218
253
|
while true
|
219
254
|
if cons
|
220
255
|
md = @scanner.scan(pattern)
|
@@ -224,7 +259,8 @@ module REXML
|
|
224
259
|
break if md
|
225
260
|
return nil if pattern.is_a?(String)
|
226
261
|
return nil if @source.nil?
|
227
|
-
return nil unless read
|
262
|
+
return nil unless read(nil, min_bytes)
|
263
|
+
min_bytes *= 2
|
228
264
|
end
|
229
265
|
|
230
266
|
md.nil? ? nil : @scanner
|
data/lib/rexml/text.rb
CHANGED
@@ -151,25 +151,45 @@ module REXML
|
|
151
151
|
end
|
152
152
|
end
|
153
153
|
|
154
|
-
|
155
|
-
string.
|
156
|
-
if
|
157
|
-
raise "Illegal character #{
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
154
|
+
pos = 0
|
155
|
+
while (index = string.index(/<|&/, pos))
|
156
|
+
if string[index] == "<"
|
157
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
158
|
+
end
|
159
|
+
|
160
|
+
unless (end_index = string.index(/[^\s];/, index + 1))
|
161
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
162
|
+
end
|
163
|
+
|
164
|
+
value = string[(index + 1)..end_index]
|
165
|
+
if /\s/.match?(value)
|
166
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
167
|
+
end
|
168
|
+
|
169
|
+
if value[0] == "#"
|
170
|
+
character_reference = value[1..-1]
|
171
|
+
|
172
|
+
unless (/\A(\d+|x[0-9a-fA-F]+)\z/.match?(character_reference))
|
173
|
+
if character_reference[0] == "x" || character_reference[-1] == "x"
|
174
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
162
175
|
else
|
163
|
-
raise "Illegal character #{
|
176
|
+
raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
|
164
177
|
end
|
165
|
-
# FIXME: below can't work but this needs API change.
|
166
|
-
# elsif @parent and $3 and !SUBSTITUTES.include?($1)
|
167
|
-
# if !doctype or !doctype.entities.has_key?($3)
|
168
|
-
# raise "Undeclared entity '#{$1}' in raw string \"#{string}\""
|
169
|
-
# end
|
170
178
|
end
|
179
|
+
|
180
|
+
case (character_reference[0] == "x" ? character_reference[1..-1].to_i(16) : character_reference[0..-1].to_i)
|
181
|
+
when *VALID_CHAR
|
182
|
+
else
|
183
|
+
raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
|
184
|
+
end
|
185
|
+
elsif !(/\A#{Entity::NAME}\z/um.match?(value))
|
186
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
171
187
|
end
|
188
|
+
|
189
|
+
pos = end_index + 1
|
172
190
|
end
|
191
|
+
|
192
|
+
string
|
173
193
|
end
|
174
194
|
|
175
195
|
def node_type
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rexml
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.
|
4
|
+
version: 3.3.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Kouhei Sutou
|
8
8
|
bindir: bin
|
9
9
|
cert_chain: []
|
10
|
-
date: 2024-
|
10
|
+
date: 2024-08-01 00:00:00.000000000 Z
|
11
11
|
dependencies:
|
12
12
|
- !ruby/object:Gem::Dependency
|
13
13
|
name: strscan
|
@@ -115,7 +115,8 @@ files:
|
|
115
115
|
homepage: https://github.com/ruby/rexml
|
116
116
|
licenses:
|
117
117
|
- BSD-2-Clause
|
118
|
-
metadata:
|
118
|
+
metadata:
|
119
|
+
changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.4
|
119
120
|
rdoc_options:
|
120
121
|
- "--main"
|
121
122
|
- README.md
|