html-pipeline 3.0.0.pre1 → 3.0.0.pre2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4b8171e0970d8cbe50757505138bbea0f812c06583cfcdfd192686a8aff9cb53
4
- data.tar.gz: 19a40ef93871f3d2b6411c526d6375a21d494a301d7a4abb2e92b6062de81b1c
3
+ metadata.gz: 786f70829bb579116d522e399faeb85c5e47354255015a7ad4f302d062151c18
4
+ data.tar.gz: d8dfaeb90583a9644c9a1d31447122953ca34f6225e9720fb8d83cc7c7848faa
5
5
  SHA512:
6
- metadata.gz: 4c160dcf8463cc5009d397529087c9f035cb3ae5e998c12ace03810fd8146b031f5f28829e318456239cd410d41fbf3d1da7d0b0e8097ceb3206733a96484c5f
7
- data.tar.gz: 72fa1ec6b3a2097dd0ce6108f55a10ac9b617e5fc607e67818876c5b8ececbc7c2bbd58e53ce530ba39f11f6ad0c292ec956e9e42fbed370213abc5b71bcf257
6
+ metadata.gz: 0b04a86725107c550b252af0dee5b6647f7cce1db0ccd3349eae1b867e1b10910a256006457e8be4dab66e8e01dd7c891e7736b0c4e5294751fab45b0ed58a37
7
+ data.tar.gz: 468721f150845e39d69942259f02e0ff79e35730678c080ac9470ed92ae24e0b917bb14ba5bda82565e052f47057c0d1d3703069faaa9a0e319c939e27260751
data/CHANGELOG.md CHANGED
@@ -1,5 +1,27 @@
1
1
  # Changelog
2
2
 
3
+ ## [v3.0.0.pre2](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre2) (2023-01-26)
4
+
5
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre1...v3.0.0.pre2)
6
+
7
+ **Closed issues:**
8
+
9
+ - Indicate a version for activesupport that has support/receives security patches \(\>= 6?\) [\#367](https://github.com/gjtorikian/html-pipeline/issues/367)
10
+ - Add MathML elements to whitelist [\#336](https://github.com/gjtorikian/html-pipeline/issues/336)
11
+ - Feature request: add safe semantic HTML tags to default whitelist [\#312](https://github.com/gjtorikian/html-pipeline/issues/312)
12
+ - Allow float: left|right and clear: left|right|both in sanitation [\#302](https://github.com/gjtorikian/html-pipeline/issues/302)
13
+ - Open link in new tab option [\#266](https://github.com/gjtorikian/html-pipeline/issues/266)
14
+ - Consider allowing LINK and META elements in HTML [\#261](https://github.com/gjtorikian/html-pipeline/issues/261)
15
+ - Allow SVG elements in whitelist [\#251](https://github.com/gjtorikian/html-pipeline/issues/251)
16
+ - Allow RDFa 1.1 \(Lite\) attributes [\#249](https://github.com/gjtorikian/html-pipeline/issues/249)
17
+ - What’s the point of allowing the accept-charset attribute in the sanitization filter ? [\#218](https://github.com/gjtorikian/html-pipeline/issues/218)
18
+ - Use gemojione instead of gemoji [\#200](https://github.com/gjtorikian/html-pipeline/issues/200)
19
+ - Link schema not used with \<a\> markup : print them directly [\#194](https://github.com/gjtorikian/html-pipeline/issues/194)
20
+
21
+ **Merged pull requests:**
22
+
23
+ - Use emoji from commonmarker [\#373](https://github.com/gjtorikian/html-pipeline/pull/373) ([gjtorikian](https://github.com/gjtorikian))
24
+
3
25
  ## [v3.0.0.pre1](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre1) (2022-12-30)
4
26
 
5
27
  [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v2.14.3...v3.0.0.pre1)
data/Gemfile CHANGED
@@ -27,9 +27,10 @@ group :development do
27
27
  end
28
28
 
29
29
  group :test do
30
- gem "commonmarker", "~> 1.0.0.pre4", require: false
30
+ gem "commonmarker", "~> 1.0.0.pre7", require: false
31
31
  gem "gemoji", "~> 3.0", require: false
32
32
  gem "gemojione", "~> 4.3", require: false
33
+
33
34
  gem "minitest"
34
35
 
35
36
  gem "minitest-bisect", "~> 1.6"
@@ -37,4 +38,5 @@ group :test do
37
38
  gem "nokogiri", "~> 1.13"
38
39
 
39
40
  gem "minitest-focus", "~> 1.1"
41
+ gem "rouge", "~> 3.1", require: false
40
42
  end
data/README.md CHANGED
@@ -230,19 +230,28 @@ end
230
230
 
231
231
  For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.
232
232
 
233
- - `AbsoluteSourceFilter` - replace relative image urls with fully qualified versions
234
- - `EmojiFilter` - converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)!
235
- - `HttpsFilter` - Replacing http urls with https versions
236
- - `ImageMaxWidthFilter` - link to full size image for large images
237
- - `MentionFilter` - replace `@user` mentions with links
238
- - `SanitizationFilter` - allow sanitize user markup
239
- - `TableOfContentsFilter` - anchor headings with name attributes and generate Table of Contents html unordered list linking headings
240
- - `TeamMentionFilter` - replace `@org/team` mentions with links
233
+ - `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
234
+ - `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
235
+ - (Note: the included `MarkdownFilter` will already convert emoji)
236
+ - `HttpsFilter`: Replacing http urls with https versions
237
+ - `ImageMaxWidthFilter`: link to full size image for large images
238
+ - `MentionFilter`: replace `@user` mentions with links
239
+ - `SanitizationFilter`: allow sanitize user markup
240
+ - `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
241
+ - (Note: the included `MarkdownFilter` will already apply highlighting)
242
+ - `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
243
+ - `TeamMentionFilter`: replace `@org/team` mentions with links
241
244
 
242
245
  ## Dependencies
243
246
 
244
- Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
245
- dependencies yourself.
247
+ Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.
248
+
249
+ For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
250
+ to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:
251
+
252
+ ```ruby
253
+ gem "rouge"
254
+ ```
246
255
 
247
256
  > **Note**
248
257
  > See the [Gemfile](/Gemfile) `:test` group for any version requirements.
data/UPGRADING.md CHANGED
@@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
13
13
  The following filters were removed:
14
14
 
15
15
  - `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
16
- - `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
17
16
  - `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash
18
17
 
19
18
  - `EmailReplyFilter`
@@ -0,0 +1,62 @@
1
+ # frozen_string_literal: true
2
+
3
+ HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")
4
+
5
+ class HTMLPipeline
6
+ class NodeFilter
7
+ # HTML Filter that syntax highlights text inside code blocks.
8
+ #
9
+ # Context options:
10
+ #
11
+ # :highlight => String represents the language to pick lexer. Defaults to empty string.
12
+ # :scope => String represents the class attribute adds to pre element after.
13
+ # Defaults to "highlight highlight-css" if highlights a css code block.
14
+ #
15
+ # This filter does not write any additional information to the context hash.
16
+ class SyntaxHighlightFilter < NodeFilter
17
+ def initialize(context: {}, result: {})
18
+ super(context: context, result: result)
19
+ # TODO: test the optionality of this
20
+ @formatter = context[:formatter] || Rouge::Formatters::HTML.new
21
+ end
22
+
23
+ SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")
24
+
25
+ def selector
26
+ SELECTOR
27
+ end
28
+
29
+ def handle_element(element)
30
+ default = context[:highlight]&.to_s
31
+ @lang = element["lang"] || default
32
+
33
+ scope = context.fetch(:scope, "highlight")
34
+
35
+ element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
36
+ end
37
+
38
+ def handle_text_chunk(text)
39
+ return if @lang.nil?
40
+ return if (lexer = lexer_for(@lang)).nil?
41
+
42
+ content = text.to_s
43
+
44
+ text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
45
+ end
46
+
47
+ def highlight_with_timeout_handling(text, lexer)
48
+ Rouge.highlight(text, lexer, @formatter)
49
+ rescue Timeout::Error => _e
50
+ text
51
+ end
52
+
53
+ def lexer_for(lang)
54
+ Rouge::Lexer.find(lang)
55
+ end
56
+
57
+ def include_lang?
58
+ !@lang.nil? && !@lang.empty?
59
+ end
60
+ end
61
+ end
62
+ end
@@ -24,8 +24,10 @@ class HTMLPipeline
24
24
  # result[:output].to_s
25
25
  # # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
26
26
  class TableOfContentsFilter < NodeFilter
27
- SELECTOR = Selma::Selector.new(match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
28
- match_text_within: "h1, h2, h3, h4, h5, h6")
27
+ SELECTOR = Selma::Selector.new(
28
+ match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
29
+ match_text_within: "h1, h2, h3, h4, h5, h6",
30
+ )
29
31
 
30
32
  def selector
31
33
  SELECTOR
@@ -16,11 +16,70 @@ class HTMLPipeline
16
16
  # The main sanitization allowlist. Only these elements and attributes are
17
17
  # allowed through by default.
18
18
  DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
19
- elements: ["h1", "h2", "h3", "h4", "h5", "h6", "br", "b", "i", "strong", "em", "a", "pre", "code",
20
- "img", "tt", "div", "ins", "del", "sup", "sub", "p", "picture", "ol", "ul", "table", "thead", "tbody", "tfoot",
21
- "blockquote", "dl", "dt", "dd", "kbd", "q", "samp", "var", "hr", "ruby", "rt", "rp", "li", "tr", "td", "th",
22
- "s", "strike", "summary", "details", "caption", "figure", "figcaption", "abbr", "bdo", "cite",
23
- "dfn", "mark", "small", "source", "span", "time", "wbr",],
19
+ elements: [
20
+ "h1",
21
+ "h2",
22
+ "h3",
23
+ "h4",
24
+ "h5",
25
+ "h6",
26
+ "br",
27
+ "b",
28
+ "i",
29
+ "strong",
30
+ "em",
31
+ "a",
32
+ "pre",
33
+ "code",
34
+ "img",
35
+ "tt",
36
+ "div",
37
+ "ins",
38
+ "del",
39
+ "sup",
40
+ "sub",
41
+ "p",
42
+ "picture",
43
+ "ol",
44
+ "ul",
45
+ "table",
46
+ "thead",
47
+ "tbody",
48
+ "tfoot",
49
+ "blockquote",
50
+ "dl",
51
+ "dt",
52
+ "dd",
53
+ "kbd",
54
+ "q",
55
+ "samp",
56
+ "var",
57
+ "hr",
58
+ "ruby",
59
+ "rt",
60
+ "rp",
61
+ "li",
62
+ "tr",
63
+ "td",
64
+ "th",
65
+ "s",
66
+ "strike",
67
+ "summary",
68
+ "details",
69
+ "caption",
70
+ "figure",
71
+ "figcaption",
72
+ "abbr",
73
+ "bdo",
74
+ "cite",
75
+ "dfn",
76
+ "mark",
77
+ "small",
78
+ "source",
79
+ "span",
80
+ "time",
81
+ "wbr",
82
+ ],
24
83
 
25
84
  attributes: {
26
85
  "a" => ["href"],
@@ -31,13 +90,77 @@ class HTMLPipeline
31
90
  "ins" => ["cite"],
32
91
  "q" => ["cite"],
33
92
  "source" => ["srcset"],
34
- all: ["abbr", "accept", "accept-charset", "accesskey", "action", "align", "alt", "aria-describedby",
35
- "aria-hidden", "aria-label", "aria-labelledby", "axis", "border", "char",
36
- "charoff", "charset", "checked", "clear", "cols", "colspan", "compact", "coords", "datetime", "dir",
37
- "disabled", "enctype", "for", "frame", "headers", "height", "hreflang", "hspace", "id", "ismap", "label", "lang",
38
- "maxlength", "media", "method", "multiple", "name", "nohref", "noshade", "nowrap", "open", "progress",
39
- "prompt", "readonly", "rel", "rev", "role", "rows", "rowspan", "rules", "scope", "selected", "shape",
40
- "size", "span", "start", "summary", "tabindex", "title", "type", "usemap", "valign", "value", "width", "itemprop",],
93
+ all: [
94
+ "abbr",
95
+ "accept",
96
+ "accept-charset",
97
+ "accesskey",
98
+ "action",
99
+ "align",
100
+ "alt",
101
+ "aria-describedby",
102
+ "aria-hidden",
103
+ "aria-label",
104
+ "aria-labelledby",
105
+ "axis",
106
+ "border",
107
+ "char",
108
+ "charoff",
109
+ "charset",
110
+ "checked",
111
+ "clear",
112
+ "cols",
113
+ "colspan",
114
+ "compact",
115
+ "coords",
116
+ "datetime",
117
+ "dir",
118
+ "disabled",
119
+ "enctype",
120
+ "for",
121
+ "frame",
122
+ "headers",
123
+ "height",
124
+ "hreflang",
125
+ "hspace",
126
+ "id",
127
+ "ismap",
128
+ "label",
129
+ "lang",
130
+ "maxlength",
131
+ "media",
132
+ "method",
133
+ "multiple",
134
+ "name",
135
+ "nohref",
136
+ "noshade",
137
+ "nowrap",
138
+ "open",
139
+ "progress",
140
+ "prompt",
141
+ "readonly",
142
+ "rel",
143
+ "rev",
144
+ "role",
145
+ "rows",
146
+ "rowspan",
147
+ "rules",
148
+ "scope",
149
+ "selected",
150
+ "shape",
151
+ "size",
152
+ "span",
153
+ "start",
154
+ "summary",
155
+ "tabindex",
156
+ "title",
157
+ "type",
158
+ "usemap",
159
+ "valign",
160
+ "value",
161
+ "width",
162
+ "itemprop",
163
+ ],
41
164
  },
42
165
  protocols: {
43
166
  "a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class HTMLPipeline
4
- VERSION = "3.0.0.pre1"
4
+ VERSION = "3.0.0.pre2"
5
5
  end
data/lib/html_pipeline.rb CHANGED
@@ -145,8 +145,11 @@ end
145
145
  context = context.freeze
146
146
  result ||= {}
147
147
 
148
- payload = default_payload({ text_filters: @text_filters.map(&:name),
149
- context: context, result: result, })
148
+ payload = default_payload({
149
+ text_filters: @text_filters.map(&:name),
150
+ context: context,
151
+ result: result,
152
+ })
150
153
  instrument("call_text_filters.html_pipeline", payload) do
151
154
  result[:output] =
152
155
  @text_filters.inject(text) do |doc, filter|
@@ -159,8 +162,11 @@ end
159
162
  html = @convert_filter.call(text) unless @convert_filter.nil?
160
163
 
161
164
  unless @node_filters.empty?
162
- payload = default_payload({ node_filters: @node_filters.map { |f| f.class.name },
163
- context: context, result: result, })
165
+ payload = default_payload({
166
+ node_filters: @node_filters.map { |f| f.class.name },
167
+ context: context,
168
+ result: result,
169
+ })
164
170
  instrument("call_node_filters.html_pipeline", payload) do
165
171
  result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
166
172
  end
@@ -178,8 +184,11 @@ end
178
184
  #
179
185
  # Returns the result of the filter.
180
186
  def perform_filter(filter, doc, context: {}, result: {})
181
- payload = default_payload({ filter: filter.name,
182
- context: context, result: result, })
187
+ payload = default_payload({
188
+ filter: filter.name,
189
+ context: context,
190
+ result: result,
191
+ })
183
192
  instrument("call_filter.html_pipeline", payload) do
184
193
  filter.call(doc, context: context, result: result)
185
194
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-pipeline
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0.pre1
4
+ version: 3.0.0.pre2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-12-30 00:00:00.000000000 Z
11
+ date: 2023-01-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: selma
@@ -71,6 +71,7 @@ files:
71
71
  - lib/html_pipeline/node_filter/https_filter.rb
72
72
  - lib/html_pipeline/node_filter/image_max_width_filter.rb
73
73
  - lib/html_pipeline/node_filter/mention_filter.rb
74
+ - lib/html_pipeline/node_filter/syntax_highlight_filter.rb
74
75
  - lib/html_pipeline/node_filter/table_of_contents_filter.rb
75
76
  - lib/html_pipeline/node_filter/team_mention_filter.rb
76
77
  - lib/html_pipeline/sanitization_filter.rb