rexml 3.3.1 → 3.3.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of rexml might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/NEWS.md +198 -0
- data/lib/rexml/attribute.rb +3 -2
- data/lib/rexml/document.rb +5 -1
- data/lib/rexml/element.rb +14 -16
- data/lib/rexml/entity.rb +9 -48
- data/lib/rexml/formatters/pretty.rb +1 -1
- data/lib/rexml/parsers/baseparser.rb +167 -45
- data/lib/rexml/parsers/pullparser.rb +12 -0
- data/lib/rexml/parsers/sax2parser.rb +16 -19
- data/lib/rexml/parsers/streamparser.rb +16 -10
- data/lib/rexml/parsers/treeparser.rb +0 -7
- data/lib/rexml/rexml.rb +1 -1
- data/lib/rexml/source.rb +16 -6
- data/lib/rexml/text.rb +39 -17
- metadata +4 -18
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 84b42219a4278ab15e7ee7627951d0b94dddc707cbf9563799b3266d02ed32db
|
|
4
|
+
data.tar.gz: 4895e6f04d100a2affc8d5c6af4c6dfec5ec4d0d863f8d22de1c66da1d253c61
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7729c31da310e2fb7c96cc3a5bd5b981fefdcdae6fe545bf2d113d91af5862fbb51789e9289b91e4247963169900b0cdccc373ffeea6ca3f935b2e32bab1e2e4
|
|
7
|
+
data.tar.gz: 542f689b7cd27b5c71aeb6845e5af2ac28186e31a98af8c45e984ce6ca563192b2a74e50b6acd95f1fde49ed6289bf9024bfd6612608455038a22e66c6b3a75b
|
data/NEWS.md
CHANGED
|
@@ -1,5 +1,203 @@
|
|
|
1
1
|
# News
|
|
2
2
|
|
|
3
|
+
## 3.3.8 - 2024-09-29 {#version-3-3-8}
|
|
4
|
+
|
|
5
|
+
### Improvements
|
|
6
|
+
|
|
7
|
+
* SAX2: Improve parse performance.
|
|
8
|
+
* GH-207
|
|
9
|
+
* Patch by NAITOH Jun.
|
|
10
|
+
|
|
11
|
+
### Fixes
|
|
12
|
+
|
|
13
|
+
* Fixed a bug that unexpected attribute namespace conflict error for
|
|
14
|
+
the predefined "xml" namespace is reported.
|
|
15
|
+
* GH-208
|
|
16
|
+
* Patch by KITAITI Makoto
|
|
17
|
+
|
|
18
|
+
### Thanks
|
|
19
|
+
|
|
20
|
+
* NAITOH Jun
|
|
21
|
+
|
|
22
|
+
* KITAITI Makoto
|
|
23
|
+
|
|
24
|
+
## 3.3.7 - 2024-09-04 {#version-3-3-7}
|
|
25
|
+
|
|
26
|
+
### Improvements
|
|
27
|
+
|
|
28
|
+
* Added local entity expansion limit methods
|
|
29
|
+
* GH-192
|
|
30
|
+
* GH-202
|
|
31
|
+
* Reported by takuya kodama.
|
|
32
|
+
* Patch by NAITOH Jun.
|
|
33
|
+
|
|
34
|
+
* Removed explicit strscan dependency
|
|
35
|
+
* GH-204
|
|
36
|
+
* Patch by Bo Anderson.
|
|
37
|
+
|
|
38
|
+
### Thanks
|
|
39
|
+
|
|
40
|
+
* takuya kodama
|
|
41
|
+
|
|
42
|
+
* NAITOH Jun
|
|
43
|
+
|
|
44
|
+
* Bo Anderson
|
|
45
|
+
|
|
46
|
+
## 3.3.6 - 2024-08-22 {#version-3-3-6}
|
|
47
|
+
|
|
48
|
+
### Improvements
|
|
49
|
+
|
|
50
|
+
* Removed duplicated entity expansions for performance.
|
|
51
|
+
* GH-194
|
|
52
|
+
* Patch by Viktor Ivarsson.
|
|
53
|
+
|
|
54
|
+
* Improved namespace conflicted attribute check performance. It was
|
|
55
|
+
too slow for deep elements.
|
|
56
|
+
* Reported by l33thaxor.
|
|
57
|
+
|
|
58
|
+
### Fixes
|
|
59
|
+
|
|
60
|
+
* Fixed a bug that default entity expansions are counted for
|
|
61
|
+
security check. Default entity expansions should not be counted
|
|
62
|
+
because they don't have a security risk.
|
|
63
|
+
* GH-198
|
|
64
|
+
* GH-199
|
|
65
|
+
* Patch Viktor Ivarsson
|
|
66
|
+
|
|
67
|
+
* Fixed a parser bug that parameter entity references in internal
|
|
68
|
+
subsets are expanded. It's not allowed in the XML specification.
|
|
69
|
+
* GH-191
|
|
70
|
+
* Patch by NAITOH Jun.
|
|
71
|
+
|
|
72
|
+
* Fixed a stream parser bug that user-defined entity references in
|
|
73
|
+
text aren't expanded.
|
|
74
|
+
* GH-200
|
|
75
|
+
* Patch by NAITOH Jun.
|
|
76
|
+
|
|
77
|
+
### Thanks
|
|
78
|
+
|
|
79
|
+
* Viktor Ivarsson
|
|
80
|
+
|
|
81
|
+
* NAITOH Jun
|
|
82
|
+
|
|
83
|
+
* l33thaxor
|
|
84
|
+
|
|
85
|
+
## 3.3.5 - 2024-08-12 {#version-3-3-5}
|
|
86
|
+
|
|
87
|
+
### Fixes
|
|
88
|
+
|
|
89
|
+
* Fixed a bug that `REXML::Security.entity_expansion_text_limit`
|
|
90
|
+
check has wrong text size calculation in SAX and pull parsers.
|
|
91
|
+
* GH-193
|
|
92
|
+
* GH-195
|
|
93
|
+
* Reported by Viktor Ivarsson.
|
|
94
|
+
* Patch by NAITOH Jun.
|
|
95
|
+
|
|
96
|
+
### Thanks
|
|
97
|
+
|
|
98
|
+
* Viktor Ivarsson
|
|
99
|
+
|
|
100
|
+
* NAITOH Jun
|
|
101
|
+
|
|
102
|
+
## 3.3.4 - 2024-08-01 {#version-3-3-4}
|
|
103
|
+
|
|
104
|
+
### Fixes
|
|
105
|
+
|
|
106
|
+
* Fixed a bug that `REXML::Security` isn't defined when
|
|
107
|
+
`REXML::Parsers::StreamParser` is used and
|
|
108
|
+
`rexml/parsers/streamparser` is only required.
|
|
109
|
+
* GH-189
|
|
110
|
+
* Patch by takuya kodama.
|
|
111
|
+
|
|
112
|
+
### Thanks
|
|
113
|
+
|
|
114
|
+
* takuya kodama
|
|
115
|
+
|
|
116
|
+
## 3.3.3 - 2024-08-01 {#version-3-3-3}
|
|
117
|
+
|
|
118
|
+
### Improvements
|
|
119
|
+
|
|
120
|
+
* Added support for detecting invalid XML that has unsupported
|
|
121
|
+
content before root element
|
|
122
|
+
* GH-184
|
|
123
|
+
* Patch by NAITOH Jun.
|
|
124
|
+
|
|
125
|
+
* Added support for `REXML::Security.entity_expansion_limit=` and
|
|
126
|
+
`REXML::Security.entity_expansion_text_limit=` in SAX2 and pull
|
|
127
|
+
parsers
|
|
128
|
+
* GH-187
|
|
129
|
+
* Patch by NAITOH Jun.
|
|
130
|
+
|
|
131
|
+
* Added more tests for invalid XMLs.
|
|
132
|
+
* GH-183
|
|
133
|
+
* Patch by Watson.
|
|
134
|
+
|
|
135
|
+
* Added more performance tests.
|
|
136
|
+
* Patch by Watson.
|
|
137
|
+
|
|
138
|
+
* Improved parse performance.
|
|
139
|
+
* GH-186
|
|
140
|
+
* Patch by tomoya ishida.
|
|
141
|
+
|
|
142
|
+
### Thanks
|
|
143
|
+
|
|
144
|
+
* NAITOH Jun
|
|
145
|
+
|
|
146
|
+
* Watson
|
|
147
|
+
|
|
148
|
+
* tomoya ishida
|
|
149
|
+
|
|
150
|
+
## 3.3.2 - 2024-07-16 {#version-3-3-2}
|
|
151
|
+
|
|
152
|
+
### Improvements
|
|
153
|
+
|
|
154
|
+
* Improved parse performance.
|
|
155
|
+
* GH-160
|
|
156
|
+
* Patch by NAITOH Jun.
|
|
157
|
+
|
|
158
|
+
* Improved parse performance.
|
|
159
|
+
* GH-169
|
|
160
|
+
* GH-170
|
|
161
|
+
* GH-171
|
|
162
|
+
* GH-172
|
|
163
|
+
* GH-173
|
|
164
|
+
* GH-174
|
|
165
|
+
* GH-175
|
|
166
|
+
* GH-176
|
|
167
|
+
* GH-177
|
|
168
|
+
* Patch by Watson.
|
|
169
|
+
|
|
170
|
+
* Added support for raising a parse exception when an XML has extra
|
|
171
|
+
content after the root element.
|
|
172
|
+
* GH-161
|
|
173
|
+
* Patch by NAITOH Jun.
|
|
174
|
+
|
|
175
|
+
* Added support for raising a parse exception when an XML
|
|
176
|
+
declaration exists in wrong position.
|
|
177
|
+
* GH-162
|
|
178
|
+
* Patch by NAITOH Jun.
|
|
179
|
+
|
|
180
|
+
* Removed needless a space after XML declaration in pretty print mode.
|
|
181
|
+
* GH-164
|
|
182
|
+
* Patch by NAITOH Jun.
|
|
183
|
+
|
|
184
|
+
* Stopped to emit `:text` event after the root element.
|
|
185
|
+
* GH-167
|
|
186
|
+
* Patch by NAITOH Jun.
|
|
187
|
+
|
|
188
|
+
### Fixes
|
|
189
|
+
|
|
190
|
+
* Fixed a bug that SAX2 parser doesn't expand predefined entities for
|
|
191
|
+
`characters` callback.
|
|
192
|
+
* GH-168
|
|
193
|
+
* Patch by NAITOH Jun.
|
|
194
|
+
|
|
195
|
+
### Thanks
|
|
196
|
+
|
|
197
|
+
* NAITOH Jun
|
|
198
|
+
|
|
199
|
+
* Watson
|
|
200
|
+
|
|
3
201
|
## 3.3.1 - 2024-06-25 {#version-3-3-1}
|
|
4
202
|
|
|
5
203
|
### Improvements
|
data/lib/rexml/attribute.rb
CHANGED
|
@@ -148,8 +148,9 @@ module REXML
|
|
|
148
148
|
# have been expanded to their values
|
|
149
149
|
def value
|
|
150
150
|
return @unnormalized if @unnormalized
|
|
151
|
-
|
|
152
|
-
@unnormalized
|
|
151
|
+
|
|
152
|
+
@unnormalized = Text::unnormalize(@normalized, doctype,
|
|
153
|
+
entity_expansion_text_limit: @element&.document&.entity_expansion_text_limit)
|
|
153
154
|
end
|
|
154
155
|
|
|
155
156
|
# The normalized value of this attribute. That is, the attribute with
|
data/lib/rexml/document.rb
CHANGED
|
@@ -91,6 +91,8 @@ module REXML
|
|
|
91
91
|
#
|
|
92
92
|
def initialize( source = nil, context = {} )
|
|
93
93
|
@entity_expansion_count = 0
|
|
94
|
+
@entity_expansion_limit = Security.entity_expansion_limit
|
|
95
|
+
@entity_expansion_text_limit = Security.entity_expansion_text_limit
|
|
94
96
|
super()
|
|
95
97
|
@context = context
|
|
96
98
|
return if source.nil?
|
|
@@ -431,10 +433,12 @@ module REXML
|
|
|
431
433
|
end
|
|
432
434
|
|
|
433
435
|
attr_reader :entity_expansion_count
|
|
436
|
+
attr_writer :entity_expansion_limit
|
|
437
|
+
attr_accessor :entity_expansion_text_limit
|
|
434
438
|
|
|
435
439
|
def record_entity_expansion
|
|
436
440
|
@entity_expansion_count += 1
|
|
437
|
-
if @entity_expansion_count >
|
|
441
|
+
if @entity_expansion_count > @entity_expansion_limit
|
|
438
442
|
raise "number of entity expansions exceeded, processing aborted."
|
|
439
443
|
end
|
|
440
444
|
end
|
data/lib/rexml/element.rb
CHANGED
|
@@ -441,9 +441,14 @@ module REXML
|
|
|
441
441
|
# Related: #root_node, #document.
|
|
442
442
|
#
|
|
443
443
|
def root
|
|
444
|
-
|
|
445
|
-
|
|
446
|
-
|
|
444
|
+
target = self
|
|
445
|
+
while target
|
|
446
|
+
return target.elements[1] if target.kind_of? Document
|
|
447
|
+
parent = target.parent
|
|
448
|
+
return target if parent.kind_of? Document or parent.nil?
|
|
449
|
+
target = parent
|
|
450
|
+
end
|
|
451
|
+
nil
|
|
447
452
|
end
|
|
448
453
|
|
|
449
454
|
# :call-seq:
|
|
@@ -619,8 +624,12 @@ module REXML
|
|
|
619
624
|
else
|
|
620
625
|
prefix = "xmlns:#{prefix}" unless prefix[0,5] == 'xmlns'
|
|
621
626
|
end
|
|
622
|
-
ns =
|
|
623
|
-
|
|
627
|
+
ns = nil
|
|
628
|
+
target = self
|
|
629
|
+
while ns.nil? and target
|
|
630
|
+
ns = target.attributes[prefix]
|
|
631
|
+
target = target.parent
|
|
632
|
+
end
|
|
624
633
|
ns = '' if ns.nil? and prefix == 'xmlns'
|
|
625
634
|
return ns
|
|
626
635
|
end
|
|
@@ -2375,17 +2384,6 @@ module REXML
|
|
|
2375
2384
|
elsif old_attr.kind_of? Hash
|
|
2376
2385
|
old_attr[value.prefix] = value
|
|
2377
2386
|
elsif old_attr.prefix != value.prefix
|
|
2378
|
-
# Check for conflicting namespaces
|
|
2379
|
-
if value.prefix != "xmlns" and old_attr.prefix != "xmlns"
|
|
2380
|
-
old_namespace = old_attr.namespace
|
|
2381
|
-
new_namespace = value.namespace
|
|
2382
|
-
if old_namespace == new_namespace
|
|
2383
|
-
raise ParseException.new(
|
|
2384
|
-
"Namespace conflict in adding attribute \"#{value.name}\": "+
|
|
2385
|
-
"Prefix \"#{old_attr.prefix}\" = \"#{old_namespace}\" and "+
|
|
2386
|
-
"prefix \"#{value.prefix}\" = \"#{new_namespace}\"")
|
|
2387
|
-
end
|
|
2388
|
-
end
|
|
2389
2387
|
store value.name, {old_attr.prefix => old_attr,
|
|
2390
2388
|
value.prefix => value}
|
|
2391
2389
|
else
|
data/lib/rexml/entity.rb
CHANGED
|
@@ -12,6 +12,7 @@ module REXML
|
|
|
12
12
|
EXTERNALID = "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
|
|
13
13
|
NDATADECL = "\\s+NDATA\\s+#{NAME}"
|
|
14
14
|
PEREFERENCE = "%#{NAME};"
|
|
15
|
+
PEREFERENCE_RE = /#{PEREFERENCE}/um
|
|
15
16
|
ENTITYVALUE = %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
|
|
16
17
|
PEDEF = "(?:#{ENTITYVALUE}|#{EXTERNALID})"
|
|
17
18
|
ENTITYDEF = "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
|
|
@@ -19,7 +20,7 @@ module REXML
|
|
|
19
20
|
GEDECL = "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
|
|
20
21
|
ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
|
|
21
22
|
|
|
22
|
-
attr_reader :name, :external, :ref, :ndata, :pubid
|
|
23
|
+
attr_reader :name, :external, :ref, :ndata, :pubid, :value
|
|
23
24
|
|
|
24
25
|
# Create a new entity. Simple entities can be constructed by passing a
|
|
25
26
|
# name, value to the constructor; this creates a generic, plain entity
|
|
@@ -68,14 +69,14 @@ module REXML
|
|
|
68
69
|
end
|
|
69
70
|
|
|
70
71
|
# Evaluates to the unnormalized value of this entity; that is, replacing
|
|
71
|
-
#
|
|
72
|
-
# +value()+ in that +value+ only replaces %ent; entities.
|
|
72
|
+
# &ent; entities.
|
|
73
73
|
def unnormalized
|
|
74
|
-
document
|
|
75
|
-
|
|
76
|
-
return nil if
|
|
77
|
-
|
|
78
|
-
@unnormalized
|
|
74
|
+
document&.record_entity_expansion
|
|
75
|
+
|
|
76
|
+
return nil if @value.nil?
|
|
77
|
+
|
|
78
|
+
@unnormalized = Text::unnormalize(@value, parent,
|
|
79
|
+
entity_expansion_text_limit: document&.entity_expansion_text_limit)
|
|
79
80
|
end
|
|
80
81
|
|
|
81
82
|
#once :unnormalized
|
|
@@ -121,46 +122,6 @@ module REXML
|
|
|
121
122
|
write rv
|
|
122
123
|
rv
|
|
123
124
|
end
|
|
124
|
-
|
|
125
|
-
PEREFERENCE_RE = /#{PEREFERENCE}/um
|
|
126
|
-
# Returns the value of this entity. At the moment, only internal entities
|
|
127
|
-
# are processed. If the value contains internal references (IE,
|
|
128
|
-
# %blah;), those are replaced with their values. IE, if the doctype
|
|
129
|
-
# contains:
|
|
130
|
-
# <!ENTITY % foo "bar">
|
|
131
|
-
# <!ENTITY yada "nanoo %foo; nanoo>
|
|
132
|
-
# then:
|
|
133
|
-
# doctype.entity('yada').value #-> "nanoo bar nanoo"
|
|
134
|
-
def value
|
|
135
|
-
@resolved_value ||= resolve_value
|
|
136
|
-
end
|
|
137
|
-
|
|
138
|
-
def parent=(other)
|
|
139
|
-
@resolved_value = nil
|
|
140
|
-
super
|
|
141
|
-
end
|
|
142
|
-
|
|
143
|
-
private
|
|
144
|
-
def resolve_value
|
|
145
|
-
return nil if @value.nil?
|
|
146
|
-
return @value unless @value.match?(PEREFERENCE_RE)
|
|
147
|
-
|
|
148
|
-
matches = @value.scan(PEREFERENCE_RE)
|
|
149
|
-
rv = @value.clone
|
|
150
|
-
if @parent
|
|
151
|
-
sum = 0
|
|
152
|
-
matches.each do |entity_reference|
|
|
153
|
-
entity_value = @parent.entity( entity_reference[0] )
|
|
154
|
-
if sum + entity_value.bytesize > Security.entity_expansion_text_limit
|
|
155
|
-
raise "entity expansion has grown too large"
|
|
156
|
-
else
|
|
157
|
-
sum += entity_value.bytesize
|
|
158
|
-
end
|
|
159
|
-
rv.gsub!( /%#{entity_reference.join};/um, entity_value )
|
|
160
|
-
end
|
|
161
|
-
end
|
|
162
|
-
rv
|
|
163
|
-
end
|
|
164
125
|
end
|
|
165
126
|
|
|
166
127
|
# This is a set of entity constants -- the ones defined in the XML
|
|
@@ -111,7 +111,7 @@ module REXML
|
|
|
111
111
|
# itself, then we don't need a carriage return... which makes this
|
|
112
112
|
# logic more complex.
|
|
113
113
|
node.children.each { |child|
|
|
114
|
-
next if child
|
|
114
|
+
next if child.instance_of?(Text)
|
|
115
115
|
unless child == node.children[0] or child.instance_of?(Text) or
|
|
116
116
|
(child == node.children[1] and !node.children[0].writethis)
|
|
117
117
|
output << "\n"
|
|
@@ -1,12 +1,29 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
require_relative '../parseexception'
|
|
3
3
|
require_relative '../undefinednamespaceexception'
|
|
4
|
+
require_relative '../security'
|
|
4
5
|
require_relative '../source'
|
|
5
6
|
require 'set'
|
|
6
7
|
require "strscan"
|
|
7
8
|
|
|
8
9
|
module REXML
|
|
9
10
|
module Parsers
|
|
11
|
+
unless [].respond_to?(:tally)
|
|
12
|
+
module EnumerableTally
|
|
13
|
+
refine Enumerable do
|
|
14
|
+
def tally
|
|
15
|
+
counts = {}
|
|
16
|
+
each do |item|
|
|
17
|
+
counts[item] ||= 0
|
|
18
|
+
counts[item] += 1
|
|
19
|
+
end
|
|
20
|
+
counts
|
|
21
|
+
end
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
using EnumerableTally
|
|
25
|
+
end
|
|
26
|
+
|
|
10
27
|
if StringScanner::Version < "3.0.8"
|
|
11
28
|
module StringScannerCaptures
|
|
12
29
|
refine StringScanner do
|
|
@@ -124,11 +141,11 @@ module REXML
|
|
|
124
141
|
}
|
|
125
142
|
|
|
126
143
|
module Private
|
|
127
|
-
|
|
144
|
+
PEREFERENCE_PATTERN = /#{PEREFERENCE}/um
|
|
128
145
|
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
|
|
129
146
|
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
|
|
130
147
|
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
|
|
131
|
-
NAME_PATTERN =
|
|
148
|
+
NAME_PATTERN = /#{NAME}/um
|
|
132
149
|
GEDECL_PATTERN = "\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
|
|
133
150
|
PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
|
|
134
151
|
ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
|
|
@@ -139,6 +156,7 @@ module REXML
|
|
|
139
156
|
default_entities.each do |term|
|
|
140
157
|
DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
|
|
141
158
|
end
|
|
159
|
+
XML_PREFIXED_NAMESPACE = "http://www.w3.org/XML/1998/namespace"
|
|
142
160
|
end
|
|
143
161
|
private_constant :Private
|
|
144
162
|
|
|
@@ -146,6 +164,9 @@ module REXML
|
|
|
146
164
|
self.stream = source
|
|
147
165
|
@listeners = []
|
|
148
166
|
@prefixes = Set.new
|
|
167
|
+
@entity_expansion_count = 0
|
|
168
|
+
@entity_expansion_limit = Security.entity_expansion_limit
|
|
169
|
+
@entity_expansion_text_limit = Security.entity_expansion_text_limit
|
|
149
170
|
end
|
|
150
171
|
|
|
151
172
|
def add_listener( listener )
|
|
@@ -153,15 +174,20 @@ module REXML
|
|
|
153
174
|
end
|
|
154
175
|
|
|
155
176
|
attr_reader :source
|
|
177
|
+
attr_reader :entity_expansion_count
|
|
178
|
+
attr_writer :entity_expansion_limit
|
|
179
|
+
attr_writer :entity_expansion_text_limit
|
|
156
180
|
|
|
157
181
|
def stream=( source )
|
|
158
182
|
@source = SourceFactory.create_from( source )
|
|
159
183
|
@closed = nil
|
|
184
|
+
@have_root = false
|
|
160
185
|
@document_status = nil
|
|
161
186
|
@tags = []
|
|
162
187
|
@stack = []
|
|
163
188
|
@entities = []
|
|
164
|
-
@
|
|
189
|
+
@namespaces = {"xml" => Private::XML_PREFIXED_NAMESPACE}
|
|
190
|
+
@namespaces_restore_stack = []
|
|
165
191
|
end
|
|
166
192
|
|
|
167
193
|
def position
|
|
@@ -229,6 +255,10 @@ module REXML
|
|
|
229
255
|
if @document_status == :in_doctype
|
|
230
256
|
raise ParseException.new("Malformed DOCTYPE: unclosed", @source)
|
|
231
257
|
end
|
|
258
|
+
unless @tags.empty?
|
|
259
|
+
path = "/" + @tags.join("/")
|
|
260
|
+
raise ParseException.new("Missing end tag for '#{path}'", @source)
|
|
261
|
+
end
|
|
232
262
|
return [ :end_document ]
|
|
233
263
|
end
|
|
234
264
|
return @stack.shift if @stack.size > 0
|
|
@@ -239,7 +269,7 @@ module REXML
|
|
|
239
269
|
if @document_status == nil
|
|
240
270
|
start_position = @source.position
|
|
241
271
|
if @source.match("<?", true)
|
|
242
|
-
return process_instruction
|
|
272
|
+
return process_instruction
|
|
243
273
|
elsif @source.match("<!", true)
|
|
244
274
|
if @source.match("--", true)
|
|
245
275
|
md = @source.match(/(.*?)-->/um, true)
|
|
@@ -261,7 +291,6 @@ module REXML
|
|
|
261
291
|
@source.position = start_position
|
|
262
292
|
raise REXML::ParseException.new(message, @source)
|
|
263
293
|
end
|
|
264
|
-
@nsstack.unshift(Set.new)
|
|
265
294
|
name = parse_name(base_error_message)
|
|
266
295
|
if @source.match(/\s*\[/um, true)
|
|
267
296
|
id = [nil, nil, nil]
|
|
@@ -309,7 +338,11 @@ module REXML
|
|
|
309
338
|
raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
|
|
310
339
|
return [ :elementdecl, "<!ELEMENT" + md[1] ]
|
|
311
340
|
elsif @source.match("ENTITY", true)
|
|
312
|
-
|
|
341
|
+
match_data = @source.match(Private::ENTITYDECL_PATTERN, true)
|
|
342
|
+
unless match_data
|
|
343
|
+
raise REXML::ParseException.new("Malformed entity declaration", @source)
|
|
344
|
+
end
|
|
345
|
+
match = [:entitydecl, *match_data.captures.compact]
|
|
313
346
|
ref = false
|
|
314
347
|
if match[1] == '%'
|
|
315
348
|
ref = true
|
|
@@ -327,6 +360,8 @@ module REXML
|
|
|
327
360
|
match[4] = match[4][1..-2] # HREF
|
|
328
361
|
match.delete_at(5) if match.size > 5 # Chop out NDATA decl
|
|
329
362
|
# match is [ :entity, name, PUBLIC, pubid, href(, ndata)? ]
|
|
363
|
+
elsif Private::PEREFERENCE_PATTERN.match?(match[2])
|
|
364
|
+
raise REXML::ParseException.new("Parameter entity references forbidden in internal subset: #{match[2]}", @source)
|
|
330
365
|
else
|
|
331
366
|
match[2] = match[2][1..-2]
|
|
332
367
|
match.pop if match.size == 4
|
|
@@ -341,7 +376,7 @@ module REXML
|
|
|
341
376
|
contents = md[0]
|
|
342
377
|
|
|
343
378
|
pairs = {}
|
|
344
|
-
values = md[0].scan( ATTDEF_RE )
|
|
379
|
+
values = md[0].strip.scan( ATTDEF_RE )
|
|
345
380
|
values.each do |attdef|
|
|
346
381
|
unless attdef[3] == "#IMPLIED"
|
|
347
382
|
attdef.compact!
|
|
@@ -349,7 +384,7 @@ module REXML
|
|
|
349
384
|
val = attdef[4] if val == "#FIXED "
|
|
350
385
|
pairs[attdef[0]] = val
|
|
351
386
|
if attdef[0] =~ /^xmlns:(.*)/
|
|
352
|
-
@
|
|
387
|
+
@namespaces[$1] = val
|
|
353
388
|
end
|
|
354
389
|
end
|
|
355
390
|
end
|
|
@@ -402,7 +437,7 @@ module REXML
|
|
|
402
437
|
# here explicitly.
|
|
403
438
|
@source.ensure_buffer
|
|
404
439
|
if @source.match("/", true)
|
|
405
|
-
@
|
|
440
|
+
@namespaces_restore_stack.pop
|
|
406
441
|
last_tag = @tags.pop
|
|
407
442
|
md = @source.match(Private::CLOSE_PATTERN, true)
|
|
408
443
|
if md and !last_tag
|
|
@@ -435,7 +470,7 @@ module REXML
|
|
|
435
470
|
raise REXML::ParseException.new( "Declarations can only occur "+
|
|
436
471
|
"in the doctype declaration.", @source)
|
|
437
472
|
elsif @source.match("?", true)
|
|
438
|
-
return process_instruction
|
|
473
|
+
return process_instruction
|
|
439
474
|
else
|
|
440
475
|
# Get the next tag
|
|
441
476
|
md = @source.match(Private::TAG_PATTERN, true)
|
|
@@ -447,21 +482,25 @@ module REXML
|
|
|
447
482
|
@document_status = :in_element
|
|
448
483
|
@prefixes.clear
|
|
449
484
|
@prefixes << md[2] if md[2]
|
|
450
|
-
|
|
451
|
-
attributes, closed = parse_attributes(@prefixes
|
|
485
|
+
push_namespaces_restore
|
|
486
|
+
attributes, closed = parse_attributes(@prefixes)
|
|
452
487
|
# Verify that all of the prefixes have been defined
|
|
453
488
|
for prefix in @prefixes
|
|
454
|
-
unless @
|
|
489
|
+
unless @namespaces.key?(prefix)
|
|
455
490
|
raise UndefinedNamespaceException.new(prefix,@source,self)
|
|
456
491
|
end
|
|
457
492
|
end
|
|
458
493
|
|
|
459
494
|
if closed
|
|
460
495
|
@closed = tag
|
|
461
|
-
|
|
496
|
+
pop_namespaces_restore
|
|
462
497
|
else
|
|
498
|
+
if @tags.empty? and @have_root
|
|
499
|
+
raise ParseException.new("Malformed XML: Extra tag at the end of the document (got '<#{tag}')", @source)
|
|
500
|
+
end
|
|
463
501
|
@tags.push( tag )
|
|
464
502
|
end
|
|
503
|
+
@have_root = true
|
|
465
504
|
return [ :start_element, tag, attributes ]
|
|
466
505
|
end
|
|
467
506
|
else
|
|
@@ -469,6 +508,16 @@ module REXML
|
|
|
469
508
|
if text.chomp!("<")
|
|
470
509
|
@source.position -= "<".bytesize
|
|
471
510
|
end
|
|
511
|
+
if @tags.empty?
|
|
512
|
+
unless /\A\s*\z/.match?(text)
|
|
513
|
+
if @have_root
|
|
514
|
+
raise ParseException.new("Malformed XML: Extra content at the end of the document (got '#{text}')", @source)
|
|
515
|
+
else
|
|
516
|
+
raise ParseException.new("Malformed XML: Content at the start of the document (got '#{text}')", @source)
|
|
517
|
+
end
|
|
518
|
+
end
|
|
519
|
+
return pull_event if @have_root
|
|
520
|
+
end
|
|
472
521
|
return [ :text, text ]
|
|
473
522
|
end
|
|
474
523
|
rescue REXML::UndefinedNamespaceException
|
|
@@ -484,13 +533,13 @@ module REXML
|
|
|
484
533
|
private :pull_event
|
|
485
534
|
|
|
486
535
|
def entity( reference, entities )
|
|
487
|
-
|
|
488
|
-
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
unnormalize( value, entities )
|
|
536
|
+
return unless entities
|
|
537
|
+
|
|
538
|
+
value = entities[ reference ]
|
|
539
|
+
return if value.nil?
|
|
540
|
+
|
|
541
|
+
record_entity_expansion
|
|
542
|
+
unnormalize( value, entities )
|
|
494
543
|
end
|
|
495
544
|
|
|
496
545
|
# Escapes all possible entities
|
|
@@ -511,7 +560,11 @@ module REXML
|
|
|
511
560
|
|
|
512
561
|
# Unescapes all possible entities
|
|
513
562
|
def unnormalize( string, entities=nil, filter=nil )
|
|
514
|
-
|
|
563
|
+
if string.include?("\r")
|
|
564
|
+
rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
|
|
565
|
+
else
|
|
566
|
+
rv = string.dup
|
|
567
|
+
end
|
|
515
568
|
matches = rv.scan( REFERENCE_RE )
|
|
516
569
|
return rv if matches.size == 0
|
|
517
570
|
rv.gsub!( Private::CHARACTER_REFERENCES ) {
|
|
@@ -520,17 +573,29 @@ module REXML
|
|
|
520
573
|
[Integer(m)].pack('U*')
|
|
521
574
|
}
|
|
522
575
|
matches.collect!{|x|x[0]}.compact!
|
|
576
|
+
if filter
|
|
577
|
+
matches.reject! do |entity_reference|
|
|
578
|
+
filter.include?(entity_reference)
|
|
579
|
+
end
|
|
580
|
+
end
|
|
523
581
|
if matches.size > 0
|
|
524
|
-
matches.each do |entity_reference|
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
rv.gsub!( er[0], er[2] ) if er
|
|
582
|
+
matches.tally.each do |entity_reference, n|
|
|
583
|
+
entity_expansion_count_before = @entity_expansion_count
|
|
584
|
+
entity_value = entity( entity_reference, entities )
|
|
585
|
+
if entity_value
|
|
586
|
+
if n > 1
|
|
587
|
+
entity_expansion_count_delta =
|
|
588
|
+
@entity_expansion_count - entity_expansion_count_before
|
|
589
|
+
record_entity_expansion(entity_expansion_count_delta * (n - 1))
|
|
533
590
|
end
|
|
591
|
+
re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
|
|
592
|
+
rv.gsub!( re, entity_value )
|
|
593
|
+
if rv.bytesize > @entity_expansion_text_limit
|
|
594
|
+
raise "entity expansion has grown too large"
|
|
595
|
+
end
|
|
596
|
+
else
|
|
597
|
+
er = DEFAULT_ENTITIES[entity_reference]
|
|
598
|
+
rv.gsub!( er[0], er[2] ) if er
|
|
534
599
|
end
|
|
535
600
|
end
|
|
536
601
|
rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
|
|
@@ -539,6 +604,39 @@ module REXML
|
|
|
539
604
|
end
|
|
540
605
|
|
|
541
606
|
private
|
|
607
|
+
def add_namespace(prefix, uri)
|
|
608
|
+
@namespaces_restore_stack.last[prefix] = @namespaces[prefix]
|
|
609
|
+
if uri.nil?
|
|
610
|
+
@namespaces.delete(prefix)
|
|
611
|
+
else
|
|
612
|
+
@namespaces[prefix] = uri
|
|
613
|
+
end
|
|
614
|
+
end
|
|
615
|
+
|
|
616
|
+
def push_namespaces_restore
|
|
617
|
+
namespaces_restore = {}
|
|
618
|
+
@namespaces_restore_stack.push(namespaces_restore)
|
|
619
|
+
namespaces_restore
|
|
620
|
+
end
|
|
621
|
+
|
|
622
|
+
def pop_namespaces_restore
|
|
623
|
+
namespaces_restore = @namespaces_restore_stack.pop
|
|
624
|
+
namespaces_restore.each do |prefix, uri|
|
|
625
|
+
if uri.nil?
|
|
626
|
+
@namespaces.delete(prefix)
|
|
627
|
+
else
|
|
628
|
+
@namespaces[prefix] = uri
|
|
629
|
+
end
|
|
630
|
+
end
|
|
631
|
+
end
|
|
632
|
+
|
|
633
|
+
def record_entity_expansion(delta=1)
|
|
634
|
+
@entity_expansion_count += delta
|
|
635
|
+
if @entity_expansion_count > @entity_expansion_limit
|
|
636
|
+
raise "number of entity expansions exceeded, processing aborted."
|
|
637
|
+
end
|
|
638
|
+
end
|
|
639
|
+
|
|
542
640
|
def need_source_encoding_update?(xml_declaration_encoding)
|
|
543
641
|
return false if xml_declaration_encoding.nil?
|
|
544
642
|
return false if /\AUTF-16\z/i =~ xml_declaration_encoding
|
|
@@ -548,14 +646,14 @@ module REXML
|
|
|
548
646
|
def parse_name(base_error_message)
|
|
549
647
|
md = @source.match(Private::NAME_PATTERN, true)
|
|
550
648
|
unless md
|
|
551
|
-
if @source.match(/\
|
|
649
|
+
if @source.match(/\S/um)
|
|
552
650
|
message = "#{base_error_message}: invalid name"
|
|
553
651
|
else
|
|
554
652
|
message = "#{base_error_message}: name is missing"
|
|
555
653
|
end
|
|
556
654
|
raise REXML::ParseException.new(message, @source)
|
|
557
655
|
end
|
|
558
|
-
md[
|
|
656
|
+
md[0]
|
|
559
657
|
end
|
|
560
658
|
|
|
561
659
|
def parse_id(base_error_message,
|
|
@@ -624,15 +722,24 @@ module REXML
|
|
|
624
722
|
end
|
|
625
723
|
end
|
|
626
724
|
|
|
627
|
-
def process_instruction
|
|
628
|
-
|
|
629
|
-
|
|
630
|
-
|
|
631
|
-
|
|
632
|
-
|
|
725
|
+
def process_instruction
|
|
726
|
+
name = parse_name("Malformed XML: Invalid processing instruction node")
|
|
727
|
+
if @source.match(/\s+/um, true)
|
|
728
|
+
match_data = @source.match(/(.*?)\?>/um, true)
|
|
729
|
+
unless match_data
|
|
730
|
+
raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
|
|
731
|
+
end
|
|
732
|
+
content = match_data[1]
|
|
733
|
+
else
|
|
734
|
+
content = nil
|
|
735
|
+
unless @source.match("?>", true)
|
|
736
|
+
raise ParseException.new("Malformed XML: Unclosed processing instruction", @source)
|
|
737
|
+
end
|
|
633
738
|
end
|
|
634
|
-
if
|
|
635
|
-
|
|
739
|
+
if name == "xml"
|
|
740
|
+
if @document_status
|
|
741
|
+
raise ParseException.new("Malformed XML: XML declaration is not at the start", @source)
|
|
742
|
+
end
|
|
636
743
|
version = VERSION.match(content)
|
|
637
744
|
version = version[1] unless version.nil?
|
|
638
745
|
encoding = ENCODING.match(content)
|
|
@@ -647,11 +754,12 @@ module REXML
|
|
|
647
754
|
standalone = standalone[1] unless standalone.nil?
|
|
648
755
|
return [ :xmldecl, version, encoding, standalone ]
|
|
649
756
|
end
|
|
650
|
-
[:processing_instruction,
|
|
757
|
+
[:processing_instruction, name, content]
|
|
651
758
|
end
|
|
652
759
|
|
|
653
|
-
def parse_attributes(prefixes
|
|
760
|
+
def parse_attributes(prefixes)
|
|
654
761
|
attributes = {}
|
|
762
|
+
expanded_names = {}
|
|
655
763
|
closed = false
|
|
656
764
|
while true
|
|
657
765
|
if @source.match(">", true)
|
|
@@ -683,7 +791,7 @@ module REXML
|
|
|
683
791
|
@source.match(/\s*/um, true)
|
|
684
792
|
if prefix == "xmlns"
|
|
685
793
|
if local_part == "xml"
|
|
686
|
-
if value !=
|
|
794
|
+
if value != Private::XML_PREFIXED_NAMESPACE
|
|
687
795
|
msg = "The 'xml' prefix must not be bound to any other namespace "+
|
|
688
796
|
"(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
|
|
689
797
|
raise REXML::ParseException.new( msg, @source, self )
|
|
@@ -693,7 +801,7 @@ module REXML
|
|
|
693
801
|
"(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
|
|
694
802
|
raise REXML::ParseException.new( msg, @source, self)
|
|
695
803
|
end
|
|
696
|
-
|
|
804
|
+
add_namespace(local_part, value)
|
|
697
805
|
elsif prefix
|
|
698
806
|
prefixes << prefix unless prefix == "xml"
|
|
699
807
|
end
|
|
@@ -703,6 +811,20 @@ module REXML
|
|
|
703
811
|
raise REXML::ParseException.new(msg, @source, self)
|
|
704
812
|
end
|
|
705
813
|
|
|
814
|
+
unless prefix == "xmlns"
|
|
815
|
+
uri = @namespaces[prefix]
|
|
816
|
+
expanded_name = [uri, local_part]
|
|
817
|
+
existing_prefix = expanded_names[expanded_name]
|
|
818
|
+
if existing_prefix
|
|
819
|
+
message = "Namespace conflict in adding attribute " +
|
|
820
|
+
"\"#{local_part}\": " +
|
|
821
|
+
"Prefix \"#{existing_prefix}\" = \"#{uri}\" and " +
|
|
822
|
+
"prefix \"#{prefix}\" = \"#{uri}\""
|
|
823
|
+
raise REXML::ParseException.new(message, @source, self)
|
|
824
|
+
end
|
|
825
|
+
expanded_names[expanded_name] = prefix
|
|
826
|
+
end
|
|
827
|
+
|
|
706
828
|
attributes[name] = value
|
|
707
829
|
else
|
|
708
830
|
message = "Invalid attribute name: <#{@source.buffer.split(%r{[/>\s]}).first}>"
|
|
@@ -47,6 +47,18 @@ module REXML
|
|
|
47
47
|
@listeners << listener
|
|
48
48
|
end
|
|
49
49
|
|
|
50
|
+
def entity_expansion_count
|
|
51
|
+
@parser.entity_expansion_count
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
def entity_expansion_limit=( limit )
|
|
55
|
+
@parser.entity_expansion_limit = limit
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
def entity_expansion_text_limit=( limit )
|
|
59
|
+
@parser.entity_expansion_text_limit = limit
|
|
60
|
+
end
|
|
61
|
+
|
|
50
62
|
def each
|
|
51
63
|
while has_next?
|
|
52
64
|
yield self.pull
|
|
@@ -22,6 +22,18 @@ module REXML
|
|
|
22
22
|
@parser.source
|
|
23
23
|
end
|
|
24
24
|
|
|
25
|
+
def entity_expansion_count
|
|
26
|
+
@parser.entity_expansion_count
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
def entity_expansion_limit=( limit )
|
|
30
|
+
@parser.entity_expansion_limit = limit
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def entity_expansion_text_limit=( limit )
|
|
34
|
+
@parser.entity_expansion_text_limit = limit
|
|
35
|
+
end
|
|
36
|
+
|
|
25
37
|
def add_listener( listener )
|
|
26
38
|
@parser.add_listener( listener )
|
|
27
39
|
end
|
|
@@ -157,25 +169,8 @@ module REXML
|
|
|
157
169
|
end
|
|
158
170
|
end
|
|
159
171
|
when :text
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
copy = event[1].clone
|
|
163
|
-
|
|
164
|
-
esub = proc { |match|
|
|
165
|
-
if @entities.has_key?($1)
|
|
166
|
-
@entities[$1].gsub(Text::REFERENCE, &esub)
|
|
167
|
-
else
|
|
168
|
-
match
|
|
169
|
-
end
|
|
170
|
-
}
|
|
171
|
-
|
|
172
|
-
copy.gsub!( Text::REFERENCE, &esub )
|
|
173
|
-
copy.gsub!( Text::NUMERICENTITY ) {|m|
|
|
174
|
-
m=$1
|
|
175
|
-
m = "0#{m}" if m[0] == ?x
|
|
176
|
-
[Integer(m)].pack('U*')
|
|
177
|
-
}
|
|
178
|
-
handle( :characters, copy )
|
|
172
|
+
unnormalized = @parser.unnormalize( event[1], @entities )
|
|
173
|
+
handle( :characters, unnormalized )
|
|
179
174
|
when :entitydecl
|
|
180
175
|
handle_entitydecl( event )
|
|
181
176
|
when :processing_instruction, :comment, :attlistdecl,
|
|
@@ -264,6 +259,8 @@ module REXML
|
|
|
264
259
|
end
|
|
265
260
|
|
|
266
261
|
def get_namespace( prefix )
|
|
262
|
+
return nil if @namespace_stack.empty?
|
|
263
|
+
|
|
267
264
|
uris = (@namespace_stack.find_all { |ns| not ns[prefix].nil? }) ||
|
|
268
265
|
(@namespace_stack.find { |ns| not ns[nil].nil? })
|
|
269
266
|
uris[-1][prefix] unless uris.nil? or 0 == uris.size
|
|
@@ -7,37 +7,42 @@ module REXML
|
|
|
7
7
|
def initialize source, listener
|
|
8
8
|
@listener = listener
|
|
9
9
|
@parser = BaseParser.new( source )
|
|
10
|
-
@
|
|
10
|
+
@entities = {}
|
|
11
11
|
end
|
|
12
12
|
|
|
13
13
|
def add_listener( listener )
|
|
14
14
|
@parser.add_listener( listener )
|
|
15
15
|
end
|
|
16
16
|
|
|
17
|
+
def entity_expansion_count
|
|
18
|
+
@parser.entity_expansion_count
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def entity_expansion_limit=( limit )
|
|
22
|
+
@parser.entity_expansion_limit = limit
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def entity_expansion_text_limit=( limit )
|
|
26
|
+
@parser.entity_expansion_text_limit = limit
|
|
27
|
+
end
|
|
28
|
+
|
|
17
29
|
def parse
|
|
18
30
|
# entity string
|
|
19
31
|
while true
|
|
20
32
|
event = @parser.pull
|
|
21
33
|
case event[0]
|
|
22
34
|
when :end_document
|
|
23
|
-
unless @tag_stack.empty?
|
|
24
|
-
tag_path = "/" + @tag_stack.join("/")
|
|
25
|
-
raise ParseException.new("Missing end tag for '#{tag_path}'",
|
|
26
|
-
@parser.source)
|
|
27
|
-
end
|
|
28
35
|
return
|
|
29
36
|
when :start_element
|
|
30
|
-
@tag_stack << event[1]
|
|
31
37
|
attrs = event[2].each do |n, v|
|
|
32
38
|
event[2][n] = @parser.unnormalize( v )
|
|
33
39
|
end
|
|
34
40
|
@listener.tag_start( event[1], attrs )
|
|
35
41
|
when :end_element
|
|
36
42
|
@listener.tag_end( event[1] )
|
|
37
|
-
@tag_stack.pop
|
|
38
43
|
when :text
|
|
39
|
-
|
|
40
|
-
@listener.text(
|
|
44
|
+
unnormalized = @parser.unnormalize( event[1], @entities )
|
|
45
|
+
@listener.text( unnormalized )
|
|
41
46
|
when :processing_instruction
|
|
42
47
|
@listener.instruction( *event[1,2] )
|
|
43
48
|
when :start_doctype
|
|
@@ -48,6 +53,7 @@ module REXML
|
|
|
48
53
|
when :comment, :attlistdecl, :cdata, :xmldecl, :elementdecl
|
|
49
54
|
@listener.send( event[0].to_s, *event[1..-1] )
|
|
50
55
|
when :entitydecl, :notationdecl
|
|
56
|
+
@entities[ event[1] ] = event[2] if event.size == 3
|
|
51
57
|
@listener.send( event[0].to_s, event[1..-1] )
|
|
52
58
|
when :externalentity
|
|
53
59
|
entity_reference = event[1]
|
|
@@ -15,7 +15,6 @@ module REXML
|
|
|
15
15
|
end
|
|
16
16
|
|
|
17
17
|
def parse
|
|
18
|
-
tag_stack = []
|
|
19
18
|
entities = nil
|
|
20
19
|
begin
|
|
21
20
|
while true
|
|
@@ -23,19 +22,13 @@ module REXML
|
|
|
23
22
|
#STDERR.puts "TREEPARSER GOT #{event.inspect}"
|
|
24
23
|
case event[0]
|
|
25
24
|
when :end_document
|
|
26
|
-
unless tag_stack.empty?
|
|
27
|
-
raise ParseException.new("No close tag for #{@build_context.xpath}",
|
|
28
|
-
@parser.source, @parser)
|
|
29
|
-
end
|
|
30
25
|
return
|
|
31
26
|
when :start_element
|
|
32
|
-
tag_stack.push(event[1])
|
|
33
27
|
el = @build_context = @build_context.add_element( event[1] )
|
|
34
28
|
event[2].each do |key, value|
|
|
35
29
|
el.attributes[key]=Attribute.new(key,value,self)
|
|
36
30
|
end
|
|
37
31
|
when :end_element
|
|
38
|
-
tag_stack.pop
|
|
39
32
|
@build_context = @build_context.parent
|
|
40
33
|
when :text
|
|
41
34
|
if @build_context[-1].instance_of? Text
|
data/lib/rexml/rexml.rb
CHANGED
data/lib/rexml/source.rb
CHANGED
|
@@ -204,10 +204,20 @@ module REXML
|
|
|
204
204
|
end
|
|
205
205
|
end
|
|
206
206
|
|
|
207
|
-
def read(term = nil)
|
|
207
|
+
def read(term = nil, min_bytes = 1)
|
|
208
208
|
term = encode(term) if term
|
|
209
209
|
begin
|
|
210
|
-
|
|
210
|
+
str = readline(term)
|
|
211
|
+
@scanner << str
|
|
212
|
+
read_bytes = str.bytesize
|
|
213
|
+
begin
|
|
214
|
+
while read_bytes < min_bytes
|
|
215
|
+
str = readline(term)
|
|
216
|
+
@scanner << str
|
|
217
|
+
read_bytes += str.bytesize
|
|
218
|
+
end
|
|
219
|
+
rescue IOError
|
|
220
|
+
end
|
|
211
221
|
true
|
|
212
222
|
rescue Exception, NameError
|
|
213
223
|
@source = nil
|
|
@@ -237,10 +247,9 @@ module REXML
|
|
|
237
247
|
read if @scanner.eos? && @source
|
|
238
248
|
end
|
|
239
249
|
|
|
240
|
-
# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
|
|
241
|
-
# - ">"
|
|
242
|
-
# - "XXX>" (X is any string excluding '>')
|
|
243
250
|
def match( pattern, cons=false )
|
|
251
|
+
# To avoid performance issue, we need to increase bytes to read per scan
|
|
252
|
+
min_bytes = 1
|
|
244
253
|
while true
|
|
245
254
|
if cons
|
|
246
255
|
md = @scanner.scan(pattern)
|
|
@@ -250,7 +259,8 @@ module REXML
|
|
|
250
259
|
break if md
|
|
251
260
|
return nil if pattern.is_a?(String)
|
|
252
261
|
return nil if @source.nil?
|
|
253
|
-
return nil unless read
|
|
262
|
+
return nil unless read(nil, min_bytes)
|
|
263
|
+
min_bytes *= 2
|
|
254
264
|
end
|
|
255
265
|
|
|
256
266
|
md.nil? ? nil : @scanner
|
data/lib/rexml/text.rb
CHANGED
|
@@ -151,25 +151,45 @@ module REXML
|
|
|
151
151
|
end
|
|
152
152
|
end
|
|
153
153
|
|
|
154
|
-
|
|
155
|
-
string.
|
|
156
|
-
if
|
|
157
|
-
raise "Illegal character #{
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
154
|
+
pos = 0
|
|
155
|
+
while (index = string.index(/<|&/, pos))
|
|
156
|
+
if string[index] == "<"
|
|
157
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
|
158
|
+
end
|
|
159
|
+
|
|
160
|
+
unless (end_index = string.index(/[^\s];/, index + 1))
|
|
161
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
|
162
|
+
end
|
|
163
|
+
|
|
164
|
+
value = string[(index + 1)..end_index]
|
|
165
|
+
if /\s/.match?(value)
|
|
166
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
|
167
|
+
end
|
|
168
|
+
|
|
169
|
+
if value[0] == "#"
|
|
170
|
+
character_reference = value[1..-1]
|
|
171
|
+
|
|
172
|
+
unless (/\A(\d+|x[0-9a-fA-F]+)\z/.match?(character_reference))
|
|
173
|
+
if character_reference[0] == "x" || character_reference[-1] == "x"
|
|
174
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
|
162
175
|
else
|
|
163
|
-
raise "Illegal character #{
|
|
176
|
+
raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
|
|
164
177
|
end
|
|
165
|
-
# FIXME: below can't work but this needs API change.
|
|
166
|
-
# elsif @parent and $3 and !SUBSTITUTES.include?($1)
|
|
167
|
-
# if !doctype or !doctype.entities.has_key?($3)
|
|
168
|
-
# raise "Undeclared entity '#{$1}' in raw string \"#{string}\""
|
|
169
|
-
# end
|
|
170
178
|
end
|
|
179
|
+
|
|
180
|
+
case (character_reference[0] == "x" ? character_reference[1..-1].to_i(16) : character_reference[0..-1].to_i)
|
|
181
|
+
when *VALID_CHAR
|
|
182
|
+
else
|
|
183
|
+
raise "Illegal character #{string.inspect} in raw string #{string.inspect}"
|
|
184
|
+
end
|
|
185
|
+
elsif !(/\A#{Entity::NAME}\z/um.match?(value))
|
|
186
|
+
raise "Illegal character \"#{string[index]}\" in raw string #{string.inspect}"
|
|
171
187
|
end
|
|
188
|
+
|
|
189
|
+
pos = end_index + 1
|
|
172
190
|
end
|
|
191
|
+
|
|
192
|
+
string
|
|
173
193
|
end
|
|
174
194
|
|
|
175
195
|
def node_type
|
|
@@ -248,7 +268,8 @@ module REXML
|
|
|
248
268
|
# u = Text.new( "sean russell", false, nil, true )
|
|
249
269
|
# u.value #-> "sean russell"
|
|
250
270
|
def value
|
|
251
|
-
@unnormalized ||= Text::unnormalize(
|
|
271
|
+
@unnormalized ||= Text::unnormalize(@string, doctype,
|
|
272
|
+
entity_expansion_text_limit: document&.entity_expansion_text_limit)
|
|
252
273
|
end
|
|
253
274
|
|
|
254
275
|
# Sets the contents of this text node. This expects the text to be
|
|
@@ -391,11 +412,12 @@ module REXML
|
|
|
391
412
|
end
|
|
392
413
|
|
|
393
414
|
# Unescapes all possible entities
|
|
394
|
-
def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil )
|
|
415
|
+
def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil, entity_expansion_text_limit: nil )
|
|
416
|
+
entity_expansion_text_limit ||= Security.entity_expansion_text_limit
|
|
395
417
|
sum = 0
|
|
396
418
|
string.gsub( /\r\n?/, "\n" ).gsub( REFERENCE ) {
|
|
397
419
|
s = Text.expand($&, doctype, filter)
|
|
398
|
-
if sum + s.bytesize >
|
|
420
|
+
if sum + s.bytesize > entity_expansion_text_limit
|
|
399
421
|
raise "entity expansion has grown too large"
|
|
400
422
|
else
|
|
401
423
|
sum += s.bytesize
|
metadata
CHANGED
|
@@ -1,28 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: rexml
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 3.3.
|
|
4
|
+
version: 3.3.8
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Kouhei Sutou
|
|
8
8
|
bindir: bin
|
|
9
9
|
cert_chain: []
|
|
10
|
-
date: 2024-
|
|
11
|
-
dependencies:
|
|
12
|
-
- !ruby/object:Gem::Dependency
|
|
13
|
-
name: strscan
|
|
14
|
-
requirement: !ruby/object:Gem::Requirement
|
|
15
|
-
requirements:
|
|
16
|
-
- - ">="
|
|
17
|
-
- !ruby/object:Gem::Version
|
|
18
|
-
version: '0'
|
|
19
|
-
type: :runtime
|
|
20
|
-
prerelease: false
|
|
21
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
22
|
-
requirements:
|
|
23
|
-
- - ">="
|
|
24
|
-
- !ruby/object:Gem::Version
|
|
25
|
-
version: '0'
|
|
10
|
+
date: 2024-09-29 00:00:00.000000000 Z
|
|
11
|
+
dependencies: []
|
|
26
12
|
description: An XML toolkit for Ruby
|
|
27
13
|
email:
|
|
28
14
|
- kou@cozmixng.org
|
|
@@ -116,7 +102,7 @@ homepage: https://github.com/ruby/rexml
|
|
|
116
102
|
licenses:
|
|
117
103
|
- BSD-2-Clause
|
|
118
104
|
metadata:
|
|
119
|
-
changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.
|
|
105
|
+
changelog_uri: https://github.com/ruby/rexml/releases/tag/v3.3.8
|
|
120
106
|
rdoc_options:
|
|
121
107
|
- "--main"
|
|
122
108
|
- README.md
|