loofah 2.12.0 → 2.19.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c868e8e66e94839a6486619672b3aa05b3f1b2d59987290eded52829fbe5ee9e
4
- data.tar.gz: 7610aa28be173f2ad1e917fa01ef783a96189cc5ee98ce8878983f67ad46661d
3
+ metadata.gz: bd3edb0acdf2359d82564aca0bc13710d9f6c49157963d18953ff55bd7c14413
4
+ data.tar.gz: 3a6e11b7deb9cfb469aaf6ec919062687bd4215ef11980bded72ca298807610c
5
5
  SHA512:
6
- metadata.gz: 6676ba0100a6348670618e03015be589c207290880a91cd86c8502767cf7f6a9a91aa2d19d8cc9131a18537f74eb5440f273430f3cc27ebf4c6691259ddf3c7b
7
- data.tar.gz: 4a9f90a2bf23dcb52a3b4a1ee7d8ace37ee310c14495fb749e7d4defc8b4a6dc4542b53b8cf6ea7579c70f753478ed523b29fbb13b6f1509fa47d422f0074b75
6
+ metadata.gz: 4970a6aa72265f60556dd6fd254375c86d3f83be23f3bbcc8b04df00ce0e801e8ef9e67d0a77ca6a21915be89226131c16a7f3540f02538cc2b9a369950dfebf
7
+ data.tar.gz: 27e3a06cc391ec3d9e3c966efdb6b4ce58e98c397ec87490d418406c17757e5cb0193edabaced30a9f24320c729e6730308e346610859f9f7c6d5fcc6f72cd56
data/CHANGELOG.md CHANGED
@@ -1,5 +1,63 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.19.1 / 2022-12-13
4
+
5
+ ### Security
6
+
7
+ * Address CVE-2022-23514, inefficient regular expression complexity. See [GHSA-486f-hjj9-9vhh](https://github.com/flavorjones/loofah/security/advisories/GHSA-486f-hjj9-9vhh) for more information.
8
+ * Address CVE-2022-23515, improper neutralization of data URIs. See [GHSA-228g-948r-83gx](https://github.com/flavorjones/loofah/security/advisories/GHSA-228g-948r-83gx) for more information.
9
+ * Address CVE-2022-23516, uncontrolled recursion. See [GHSA-3x8r-x6xp-q4vm](https://github.com/flavorjones/loofah/security/advisories/GHSA-3x8r-x6xp-q4vm) for more information.
10
+
11
+
12
+ ## 2.19.0 / 2022-09-14
13
+
14
+ ### Features
15
+
16
+ * Allow SVG 1.0 color keyword names in CSS attributes. These colors are part of the [CSS Color Module Level 3](https://www.w3.org/TR/css-color-3/#svg-color) recommendation released 2022-01-18. [[#243](https://github.com/flavorjones/loofah/issues/243)]
17
+
18
+
19
+ ## 2.18.0 / 2022-05-11
20
+
21
+ ### Features
22
+
23
+ * Allow CSS property `aspect-ratio`. [[#236](https://github.com/flavorjones/loofah/issues/236)] (Thanks, [@louim](https://github.com/louim)!)
24
+
25
+
26
+ ## 2.17.0 / 2022-04-28
27
+
28
+ ### Features
29
+
30
+ * Allow ARIA attributes. [[#232](https://github.com/flavorjones/loofah/issues/232), [#233](https://github.com/flavorjones/loofah/issues/233)] (Thanks, [@nick-desteffen](https://github.com/nick-desteffen)!)
31
+
32
+
33
+ ## 2.16.0 / 2022-04-01
34
+
35
+ ### Features
36
+
37
+ * Allow MathML elements `menclose` and `ms`, and MathML attributes `dir`, `href`, `lquote`, `mathsize`, `notation`, and `rquote`. [[#231](https://github.com/flavorjones/loofah/issues/231)] (Thanks, [@nick-desteffen](https://github.com/nick-desteffen)!)
38
+
39
+
40
+ ## 2.15.0 / 2022-03-14
41
+
42
+ ### Features
43
+
44
+ * Expand set of allowed protocols to include `sms:`. [[#228](https://github.com/flavorjones/loofah/issues/228)] (Thanks, [@brendon](https://github.com/brendon)!)
45
+
46
+
47
+ ## 2.14.0 / 2022-02-11
48
+
49
+ ### Features
50
+
51
+ * The `#to_text` method on `Loofah::HTML::{Document,DocumentFragment}` replaces `<br>` line break elements with a newline. [[#225](https://github.com/flavorjones/loofah/issues/225)]
52
+
53
+
54
+ ## 2.13.0 / 2021-12-10
55
+
56
+ ### Bug fixes
57
+
58
+ * Loofah::HTML::DocumentFragment#text no longer serializes top-level comment children. [[#221](https://github.com/flavorjones/loofah/issues/221)]
59
+
60
+
3
61
  ## 2.12.0 / 2021-08-11
4
62
 
5
63
  ### Features
data/README.md CHANGED
@@ -133,13 +133,12 @@ and `text` to return plain text:
133
133
  doc.text # => "ohai! div is safe "
134
134
  ```
135
135
 
136
- Also, `to_text` is available, which does the right thing with
137
- whitespace around block-level elements.
136
+ Also, `to_text` is available, which does the right thing with whitespace around block-level and line break elements.
138
137
 
139
138
  ``` ruby
140
- doc = Loofah.fragment("<h1>Title</h1><div>Content</div>")
141
- doc.text # => "TitleContent" # probably not what you want
142
- doc.to_text # => "\nTitle\n\nContent\n" # better
139
+ doc = Loofah.fragment("<h1>Title</h1><div>Content<br>Next line</div>")
140
+ doc.text # => "TitleContentNext line" # probably not what you want
141
+ doc.to_text # => "\nTitle\n\nContent\nNext line\n" # better
143
142
  ```
144
143
 
145
144
  ### Loofah::XML::Document and Loofah::XML::DocumentFragment
@@ -349,7 +348,7 @@ And a big shout-out to Corey Innis for the name, and feedback on the API.
349
348
 
350
349
  ## Thank You
351
350
 
352
- The following people have generously donated via the [Pledgie](http://pledgie.com) badge on the [Loofah github page](https://github.com/flavorjones/loofah):
351
+ The following people have generously funded Loofah:
353
352
 
354
353
  * Bill Harding
355
354
 
@@ -70,8 +70,6 @@ module Loofah
70
70
  video
71
71
  ]
72
72
 
73
- STRICT_BLOCK_LEVEL = STRICT_BLOCK_LEVEL_HTML4 + STRICT_BLOCK_LEVEL_HTML5
74
-
75
73
  # The following elements may also be considered block-level
76
74
  # elements since they may contain block-level elements
77
75
  LOOSE_BLOCK_LEVEL = Set.new %w[dd
@@ -86,7 +84,12 @@ module Loofah
86
84
  tr
87
85
  ]
88
86
 
87
+ # Elements that aren't block but should generate a newline in #to_text
88
+ INLINE_LINE_BREAK = Set.new(["br"])
89
+
90
+ STRICT_BLOCK_LEVEL = STRICT_BLOCK_LEVEL_HTML4 + STRICT_BLOCK_LEVEL_HTML5
89
91
  BLOCK_LEVEL = STRICT_BLOCK_LEVEL + LOOSE_BLOCK_LEVEL
92
+ LINEBREAKERS = BLOCK_LEVEL + INLINE_LINE_BREAK
90
93
  end
91
94
 
92
95
  ::Loofah::MetaHelpers.add_downcased_set_members_to_all_set_constants ::Loofah::Elements
@@ -148,6 +148,7 @@ module Loofah
148
148
  "annotation-xml",
149
149
  "maction",
150
150
  "math",
151
+ "menclose",
151
152
  "merror",
152
153
  "mfenced",
153
154
  "mfrac",
@@ -161,6 +162,7 @@ module Loofah
161
162
  "mprescripts",
162
163
  "mroot",
163
164
  "mrow",
165
+ "ms",
164
166
  "mspace",
165
167
  "msqrt",
166
168
  "mstyle",
@@ -313,6 +315,7 @@ module Loofah
313
315
  "columnspacing",
314
316
  "columnspan",
315
317
  "depth",
318
+ "dir",
316
319
  "display",
317
320
  "displaystyle",
318
321
  "encoding",
@@ -323,19 +326,24 @@ module Loofah
323
326
  "fontweight",
324
327
  "frame",
325
328
  "height",
329
+ "href",
326
330
  "linethickness",
331
+ "lquote",
327
332
  "lspace",
328
333
  "mathbackground",
329
334
  "mathcolor",
335
+ "mathsize",
330
336
  "mathvariant",
331
337
  "maxsize",
332
338
  "minsize",
339
+ "notation",
333
340
  "open",
334
341
  "other",
335
342
  "rowalign",
336
343
  "rowlines",
337
344
  "rowspacing",
338
345
  "rowspan",
346
+ "rquote",
339
347
  "rspace",
340
348
  "scriptlevel",
341
349
  "selection",
@@ -504,6 +512,62 @@ module Loofah
504
512
  "zoomAndPan",
505
513
  ])
506
514
 
515
+ ARIA_ATTRIBUTES = Set.new([
516
+ "aria-activedescendant",
517
+ "aria-atomic",
518
+ "aria-autocomplete",
519
+ "aria-braillelabel",
520
+ "aria-brailleroledescription",
521
+ "aria-busy",
522
+ "aria-checked",
523
+ "aria-colcount",
524
+ "aria-colindex",
525
+ "aria-colindextext",
526
+ "aria-colspan",
527
+ "aria-controls",
528
+ "aria-current",
529
+ "aria-describedby",
530
+ "aria-description",
531
+ "aria-details",
532
+ "aria-disabled",
533
+ "aria-dropeffect",
534
+ "aria-errormessage",
535
+ "aria-expanded",
536
+ "aria-flowto",
537
+ "aria-grabbed",
538
+ "aria-haspopup",
539
+ "aria-hidden",
540
+ "aria-invalid",
541
+ "aria-keyshortcuts",
542
+ "aria-label",
543
+ "aria-labelledby",
544
+ "aria-level",
545
+ "aria-live",
546
+ "aria-multiline",
547
+ "aria-multiselectable",
548
+ "aria-orientation",
549
+ "aria-owns",
550
+ "aria-placeholder",
551
+ "aria-posinset",
552
+ "aria-pressed",
553
+ "aria-readonly",
554
+ "aria-relevant",
555
+ "aria-required",
556
+ "aria-roledescription",
557
+ "aria-rowcount",
558
+ "aria-rowindex",
559
+ "aria-rowindextext",
560
+ "aria-rowspan",
561
+ "aria-selected",
562
+ "aria-setsize",
563
+ "aria-sort",
564
+ "aria-valuemax",
565
+ "aria-valuemin",
566
+ "aria-valuenow",
567
+ "aria-valuetext",
568
+ "role",
569
+ ])
570
+
507
571
  ATTR_VAL_IS_URI = Set.new([
508
572
  "action",
509
573
  "cite",
@@ -553,6 +617,7 @@ module Loofah
553
617
  "align-content",
554
618
  "align-items",
555
619
  "align-self",
620
+ "aspect-ratio",
556
621
  "background-color",
557
622
  "border-bottom-color",
558
623
  "border-collapse",
@@ -619,23 +684,16 @@ module Loofah
619
684
 
620
685
  ACCEPTABLE_CSS_KEYWORDS = Set.new([
621
686
  "!important",
622
- "aqua",
623
687
  "auto",
624
- "black",
625
688
  "block",
626
- "blue",
627
689
  "bold",
628
690
  "both",
629
691
  "bottom",
630
- "brown",
631
692
  "center",
632
693
  "collapse",
633
694
  "dashed",
634
695
  "dotted",
635
696
  "double",
636
- "fuchsia",
637
- "gray",
638
- "green",
639
697
  "groove",
640
698
  "hidden",
641
699
  "inherit",
@@ -643,35 +701,196 @@ module Loofah
643
701
  "inset",
644
702
  "italic",
645
703
  "left",
646
- "lime",
647
- "maroon",
648
704
  "medium",
649
- "navy",
650
705
  "none",
651
706
  "normal",
652
707
  "nowrap",
653
- "olive",
654
708
  "outset",
655
709
  "pointer",
656
- "purple",
657
- "red",
658
710
  "revert",
659
711
  "ridge",
660
712
  "right",
661
713
  "separate",
662
- "silver",
663
714
  "solid",
664
- "teal",
665
715
  "thick",
666
716
  "thin",
667
717
  "top",
668
718
  "transparent",
669
719
  "underline",
670
720
  "unset",
671
- "white",
672
- "yellow",
673
721
  ])
674
722
 
723
+ # https://www.w3.org/TR/css-color-3/#html4
724
+ ACCEPTABLE_CSS_COLORS = Set.new([
725
+ "aqua",
726
+ "black",
727
+ "blue",
728
+ "fuchsia",
729
+ "gray",
730
+ "green",
731
+ "lime",
732
+ "maroon",
733
+ "navy",
734
+ "olive",
735
+ "purple",
736
+ "red",
737
+ "silver",
738
+ "teal",
739
+ "white",
740
+ "yellow",
741
+ ])
742
+
743
+ # https://www.w3.org/TR/css-color-3/#svg-color
744
+ ACCEPTABLE_CSS_EXTENDED_COLORS = Set.new([
745
+ "aliceblue",
746
+ "antiquewhite",
747
+ "aqua",
748
+ "aquamarine",
749
+ "azure",
750
+ "beige",
751
+ "bisque",
752
+ "black",
753
+ "blanchedalmond",
754
+ "blue",
755
+ "blueviolet",
756
+ "brown",
757
+ "burlywood",
758
+ "cadetblue",
759
+ "chartreuse",
760
+ "chocolate",
761
+ "coral",
762
+ "cornflowerblue",
763
+ "cornsilk",
764
+ "crimson",
765
+ "cyan",
766
+ "darkblue",
767
+ "darkcyan",
768
+ "darkgoldenrod",
769
+ "darkgray",
770
+ "darkgreen",
771
+ "darkgrey",
772
+ "darkkhaki",
773
+ "darkmagenta",
774
+ "darkolivegreen",
775
+ "darkorange",
776
+ "darkorchid",
777
+ "darkred",
778
+ "darksalmon",
779
+ "darkseagreen",
780
+ "darkslateblue",
781
+ "darkslategray",
782
+ "darkslategrey",
783
+ "darkturquoise",
784
+ "darkviolet",
785
+ "deeppink",
786
+ "deepskyblue",
787
+ "dimgray",
788
+ "dimgrey",
789
+ "dodgerblue",
790
+ "firebrick",
791
+ "floralwhite",
792
+ "forestgreen",
793
+ "fuchsia",
794
+ "gainsboro",
795
+ "ghostwhite",
796
+ "gold",
797
+ "goldenrod",
798
+ "gray",
799
+ "green",
800
+ "greenyellow",
801
+ "grey",
802
+ "honeydew",
803
+ "hotpink",
804
+ "indianred",
805
+ "indigo",
806
+ "ivory",
807
+ "khaki",
808
+ "lavender",
809
+ "lavenderblush",
810
+ "lawngreen",
811
+ "lemonchiffon",
812
+ "lightblue",
813
+ "lightcoral",
814
+ "lightcyan",
815
+ "lightgoldenrodyellow",
816
+ "lightgray",
817
+ "lightgreen",
818
+ "lightgrey",
819
+ "lightpink",
820
+ "lightsalmon",
821
+ "lightseagreen",
822
+ "lightskyblue",
823
+ "lightslategray",
824
+ "lightslategrey",
825
+ "lightsteelblue",
826
+ "lightyellow",
827
+ "lime",
828
+ "limegreen",
829
+ "linen",
830
+ "magenta",
831
+ "maroon",
832
+ "mediumaquamarine",
833
+ "mediumblue",
834
+ "mediumorchid",
835
+ "mediumpurple",
836
+ "mediumseagreen",
837
+ "mediumslateblue",
838
+ "mediumspringgreen",
839
+ "mediumturquoise",
840
+ "mediumvioletred",
841
+ "midnightblue",
842
+ "mintcream",
843
+ "mistyrose",
844
+ "moccasin",
845
+ "navajowhite",
846
+ "navy",
847
+ "oldlace",
848
+ "olive",
849
+ "olivedrab",
850
+ "orange",
851
+ "orangered",
852
+ "orchid",
853
+ "palegoldenrod",
854
+ "palegreen",
855
+ "paleturquoise",
856
+ "palevioletred",
857
+ "papayawhip",
858
+ "peachpuff",
859
+ "peru",
860
+ "pink",
861
+ "plum",
862
+ "powderblue",
863
+ "purple",
864
+ "red",
865
+ "rosybrown",
866
+ "royalblue",
867
+ "saddlebrown",
868
+ "salmon",
869
+ "sandybrown",
870
+ "seagreen",
871
+ "seashell",
872
+ "sienna",
873
+ "silver",
874
+ "skyblue",
875
+ "slateblue",
876
+ "slategray",
877
+ "slategrey",
878
+ "snow",
879
+ "springgreen",
880
+ "steelblue",
881
+ "tan",
882
+ "teal",
883
+ "thistle",
884
+ "tomato",
885
+ "turquoise",
886
+ "violet",
887
+ "wheat",
888
+ "white",
889
+ "whitesmoke",
890
+ "yellow",
891
+ "yellowgreen",
892
+ ])
893
+
675
894
  # see https://www.quackit.com/css/functions/
676
895
  # omit `url` and `image` from that list
677
896
  ACCEPTABLE_CSS_FUNCTIONS = Set.new([
@@ -766,6 +985,7 @@ module Loofah
766
985
  "rsync",
767
986
  "rtsp",
768
987
  "sftp",
988
+ "sms",
769
989
  "ssh",
770
990
  "tag",
771
991
  "tel",
@@ -779,16 +999,15 @@ module Loofah
779
999
  "image/gif",
780
1000
  "image/jpeg",
781
1001
  "image/png",
782
- "image/svg+xml",
783
1002
  "text/css",
784
1003
  "text/plain",
785
1004
  ])
786
1005
 
787
1006
  # subclasses may define their own versions of these constants
788
1007
  ALLOWED_ELEMENTS = ACCEPTABLE_ELEMENTS + MATHML_ELEMENTS + SVG_ELEMENTS
789
- ALLOWED_ATTRIBUTES = ACCEPTABLE_ATTRIBUTES + MATHML_ATTRIBUTES + SVG_ATTRIBUTES
1008
+ ALLOWED_ATTRIBUTES = ACCEPTABLE_ATTRIBUTES + MATHML_ATTRIBUTES + SVG_ATTRIBUTES + ARIA_ATTRIBUTES
790
1009
  ALLOWED_CSS_PROPERTIES = ACCEPTABLE_CSS_PROPERTIES
791
- ALLOWED_CSS_KEYWORDS = ACCEPTABLE_CSS_KEYWORDS
1010
+ ALLOWED_CSS_KEYWORDS = ACCEPTABLE_CSS_KEYWORDS + ACCEPTABLE_CSS_COLORS + ACCEPTABLE_CSS_EXTENDED_COLORS
792
1011
  ALLOWED_CSS_FUNCTIONS = ACCEPTABLE_CSS_FUNCTIONS
793
1012
  ALLOWED_SVG_PROPERTIES = ACCEPTABLE_SVG_PROPERTIES
794
1013
  ALLOWED_PROTOCOLS = ACCEPTABLE_PROTOCOLS
@@ -36,24 +36,13 @@ module Loofah
36
36
  end
37
37
 
38
38
  if SafeList::ATTR_VAL_IS_URI.include?(attr_name)
39
- # this block lifted nearly verbatim from HTML5 sanitization
40
- val_unescaped = CGI.unescapeHTML(attr_node.value).gsub(CONTROL_CHARACTERS, "").downcase
41
- if val_unescaped =~ /^[a-z0-9][-+.a-z0-9]*:/ && !SafeList::ALLOWED_PROTOCOLS.include?(val_unescaped.split(SafeList::PROTOCOL_SEPARATOR)[0])
42
- attr_node.remove
43
- next
44
- elsif val_unescaped.split(SafeList::PROTOCOL_SEPARATOR)[0] == "data"
45
- # permit only allowed data mediatypes
46
- mediatype = val_unescaped.split(SafeList::PROTOCOL_SEPARATOR)[1]
47
- mediatype, _ = mediatype.split(";")[0..1] if mediatype
48
- if mediatype && !SafeList::ALLOWED_URI_DATA_MEDIATYPES.include?(mediatype)
49
- attr_node.remove
50
- next
51
- end
52
- end
39
+ next if scrub_uri_attribute(attr_node)
53
40
  end
41
+
54
42
  if SafeList::SVG_ATTR_VAL_ALLOWS_REF.include?(attr_name)
55
- attr_node.value = attr_node.value.gsub(/url\s*\(\s*[^#\s][^)]+?\)/m, " ") if attr_node.value
43
+ scrub_attribute_that_allows_local_ref(attr_node)
56
44
  end
45
+
57
46
  if SafeList::SVG_ALLOW_LOCAL_HREF.include?(node.name) && attr_name == "xlink:href" && attr_node.value =~ /^\s*[^#\s].*/m
58
47
  attr_node.remove
59
48
  next
@@ -127,6 +116,47 @@ module Loofah
127
116
  Crass::Parser.stringify(sanitized_tree)
128
117
  end
129
118
 
119
+ def scrub_attribute_that_allows_local_ref(attr_node)
120
+ return unless attr_node.value
121
+
122
+ nodes = Crass::Parser.new(attr_node.value).parse_component_values
123
+
124
+ values = nodes.map do |node|
125
+ case node[:node]
126
+ when :url
127
+ if node[:value].start_with?("#")
128
+ node[:raw]
129
+ else
130
+ nil
131
+ end
132
+ when :hash, :ident, :string
133
+ node[:raw]
134
+ else
135
+ nil
136
+ end
137
+ end.compact
138
+
139
+ attr_node.value = values.join(" ")
140
+ end
141
+
142
+ def scrub_uri_attribute(attr_node)
143
+ # this block lifted nearly verbatim from HTML5 sanitization
144
+ val_unescaped = CGI.unescapeHTML(attr_node.value).gsub(CONTROL_CHARACTERS, "").downcase
145
+ if val_unescaped =~ /^[a-z0-9][-+.a-z0-9]*:/ && !SafeList::ALLOWED_PROTOCOLS.include?(val_unescaped.split(SafeList::PROTOCOL_SEPARATOR)[0])
146
+ attr_node.remove
147
+ return true
148
+ elsif val_unescaped.split(SafeList::PROTOCOL_SEPARATOR)[0] == "data"
149
+ # permit only allowed data mediatypes
150
+ mediatype = val_unescaped.split(SafeList::PROTOCOL_SEPARATOR)[1]
151
+ mediatype, _ = mediatype.split(";")[0..1] if mediatype
152
+ if mediatype && !SafeList::ALLOWED_URI_DATA_MEDIATYPES.include?(mediatype)
153
+ attr_node.remove
154
+ return true
155
+ end
156
+ end
157
+ false
158
+ end
159
+
130
160
  #
131
161
  # libxml2 >= 2.9.2 fails to escape comments within some attributes.
132
162
  #
@@ -152,6 +182,46 @@ module Loofah
152
182
  end.force_encoding(encoding)
153
183
  end
154
184
  end
185
+
186
+ def cdata_needs_escaping?(node)
187
+ # Nokogiri's HTML4 parser on JRuby doesn't flag the child of a `style` or `script` tag as cdata, but it acts that way
188
+ node.cdata? || (Nokogiri.jruby? && node.text? && (node.parent.name == "style" || node.parent.name == "script"))
189
+ end
190
+
191
+ def cdata_escape(node)
192
+ escaped_text = escape_tags(node.text)
193
+ if Nokogiri.jruby?
194
+ node.document.create_text_node(escaped_text)
195
+ else
196
+ node.document.create_cdata(escaped_text)
197
+ end
198
+ end
199
+
200
+ TABLE_FOR_ESCAPE_HTML__ = {
201
+ '<' => '&lt;',
202
+ '>' => '&gt;',
203
+ '&' => '&amp;',
204
+ }
205
+
206
+ def escape_tags(string)
207
+ # modified version of CGI.escapeHTML from ruby 3.1
208
+ enc = string.encoding
209
+ unless enc.ascii_compatible?
210
+ if enc.dummy?
211
+ origenc = enc
212
+ enc = Encoding::Converter.asciicompat_encoding(enc)
213
+ string = enc ? string.encode(enc) : string.b
214
+ end
215
+ table = Hash[TABLE_FOR_ESCAPE_HTML__.map {|pair|pair.map {|s|s.encode(enc)}}]
216
+ string = string.gsub(/#{"[<>&]".encode(enc)}/, table)
217
+ string.encode!(origenc) if origenc
218
+ string
219
+ else
220
+ string = string.b
221
+ string.gsub!(/[<>&]/, TABLE_FOR_ESCAPE_HTML__)
222
+ string.force_encoding(enc)
223
+ end
224
+ end
155
225
  end
156
226
  end
157
227
  end
@@ -93,7 +93,11 @@ module Loofah
93
93
  # frag.text(:encode_special_chars => false) # => "<script>alert('EVIL');</script>"
94
94
  #
95
95
  def text(options = {})
96
- result = serialize_root.children.inner_text rescue ""
96
+ result = if serialize_root
97
+ serialize_root.children.reject(&:comment?).map(&:inner_text).join("")
98
+ else
99
+ ""
100
+ end
97
101
  if options[:encode_special_chars] == false
98
102
  result # possibly dangerous if rendered in a browser
99
103
  else
@@ -108,11 +112,11 @@ module Loofah
108
112
  # Returns a plain-text version of the markup contained by the
109
113
  # fragment, with HTML entities encoded.
110
114
  #
111
- # This method is slower than #to_text, but is clever about
112
- # whitespace around block elements.
115
+ # This method is slower than #text, but is clever about
116
+ # whitespace around block elements and line break elements.
113
117
  #
114
- # Loofah.document("<h1>Title</h1><div>Content</div>").to_text
115
- # # => "\nTitle\n\nContent\n"
118
+ # Loofah.document("<h1>Title</h1><div>Content<br>Next line</div>").to_text
119
+ # # => "\nTitle\n\nContent\nNext line\n"
116
120
  #
117
121
  def to_text(options = {})
118
122
  Loofah.remove_extraneous_whitespace self.dup.scrub!(:newline_block_elements).text(options)
@@ -108,6 +108,10 @@ module Loofah
108
108
  return Scrubber::CONTINUE
109
109
  end
110
110
  when Nokogiri::XML::Node::TEXT_NODE, Nokogiri::XML::Node::CDATA_SECTION_NODE
111
+ if HTML5::Scrub.cdata_needs_escaping?(node)
112
+ node.before(HTML5::Scrub.cdata_escape(node))
113
+ return Scrubber::STOP
114
+ end
111
115
  return Scrubber::CONTINUE
112
116
  end
113
117
  Scrubber::STOP
@@ -100,13 +100,9 @@ module Loofah
100
100
 
101
101
  def scrub(node)
102
102
  return CONTINUE if html5lib_sanitize(node) == CONTINUE
103
- if node.children.length == 1 && node.children.first.cdata?
104
- sanitized_text = Loofah.fragment(node.children.first.to_html).scrub!(:strip).to_html
105
- node.before Nokogiri::XML::Text.new(sanitized_text, node.document)
106
- else
107
- node.before node.children
108
- end
103
+ node.before(node.children)
109
104
  node.remove
105
+ return STOP
110
106
  end
111
107
  end
112
108
 
@@ -240,8 +236,13 @@ module Loofah
240
236
  end
241
237
 
242
238
  def scrub(node)
243
- return CONTINUE unless Loofah::Elements::BLOCK_LEVEL.include?(node.name)
244
- node.add_next_sibling Nokogiri::XML::Text.new("\n#{node.content}\n", node.document)
239
+ return CONTINUE unless Loofah::Elements::LINEBREAKERS.include?(node.name)
240
+ replacement = if Loofah::Elements::INLINE_LINE_BREAK.include?(node.name)
241
+ "\n"
242
+ else
243
+ "\n#{node.content}\n"
244
+ end
245
+ node.add_next_sibling Nokogiri::XML::Text.new(replacement, node.document)
245
246
  node.remove
246
247
  end
247
248
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
  module Loofah
3
3
  # The version of Loofah you are using
4
- VERSION = "2.12.0"
4
+ VERSION = "2.19.1"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: loofah
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.12.0
4
+ version: 2.19.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mike Dalessio
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2021-08-11 00:00:00.000000000 Z
12
+ date: 2022-12-12 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: crass
@@ -199,7 +199,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
199
199
  - !ruby/object:Gem::Version
200
200
  version: '0'
201
201
  requirements: []
202
- rubygems_version: 3.2.15
202
+ rubygems_version: 3.3.7
203
203
  signing_key:
204
204
  specification_version: 4
205
205
  summary: Loofah is a general library for manipulating and transforming HTML/XML documents