html-pipeline 3.0.0.pre1 → 3.0.0.pre2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4b8171e0970d8cbe50757505138bbea0f812c06583cfcdfd192686a8aff9cb53
4
- data.tar.gz: 19a40ef93871f3d2b6411c526d6375a21d494a301d7a4abb2e92b6062de81b1c
3
+ metadata.gz: 786f70829bb579116d522e399faeb85c5e47354255015a7ad4f302d062151c18
4
+ data.tar.gz: d8dfaeb90583a9644c9a1d31447122953ca34f6225e9720fb8d83cc7c7848faa
5
5
  SHA512:
6
- metadata.gz: 4c160dcf8463cc5009d397529087c9f035cb3ae5e998c12ace03810fd8146b031f5f28829e318456239cd410d41fbf3d1da7d0b0e8097ceb3206733a96484c5f
7
- data.tar.gz: 72fa1ec6b3a2097dd0ce6108f55a10ac9b617e5fc607e67818876c5b8ececbc7c2bbd58e53ce530ba39f11f6ad0c292ec956e9e42fbed370213abc5b71bcf257
6
+ metadata.gz: 0b04a86725107c550b252af0dee5b6647f7cce1db0ccd3349eae1b867e1b10910a256006457e8be4dab66e8e01dd7c891e7736b0c4e5294751fab45b0ed58a37
7
+ data.tar.gz: 468721f150845e39d69942259f02e0ff79e35730678c080ac9470ed92ae24e0b917bb14ba5bda82565e052f47057c0d1d3703069faaa9a0e319c939e27260751
data/CHANGELOG.md CHANGED
@@ -1,5 +1,27 @@
1
1
  # Changelog
2
2
 
3
+ ## [v3.0.0.pre2](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre2) (2023-01-26)
4
+
5
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre1...v3.0.0.pre2)
6
+
7
+ **Closed issues:**
8
+
9
+ - Indicate a version for activesupport that has support/receives security patches \(\>= 6?\) [\#367](https://github.com/gjtorikian/html-pipeline/issues/367)
10
+ - Add MathML elements to whitelist [\#336](https://github.com/gjtorikian/html-pipeline/issues/336)
11
+ - Feature request: add safe semantic HTML tags to default whitelist [\#312](https://github.com/gjtorikian/html-pipeline/issues/312)
12
+ - Allow float: left|right and clear: left|right|both in sanitation [\#302](https://github.com/gjtorikian/html-pipeline/issues/302)
13
+ - Open link in new tab option [\#266](https://github.com/gjtorikian/html-pipeline/issues/266)
14
+ - Consider allowing LINK and META elements in HTML [\#261](https://github.com/gjtorikian/html-pipeline/issues/261)
15
+ - Allow SVG elements in whitelist [\#251](https://github.com/gjtorikian/html-pipeline/issues/251)
16
+ - Allow RDFa 1.1 \(Lite\) attributes [\#249](https://github.com/gjtorikian/html-pipeline/issues/249)
17
+ - What’s the point of allowing the accept-charset attribute in the sanitization filter ? [\#218](https://github.com/gjtorikian/html-pipeline/issues/218)
18
+ - Use gemojione instead of gemoji [\#200](https://github.com/gjtorikian/html-pipeline/issues/200)
19
+ - Link schema not used with \<a\> markup : print them directly [\#194](https://github.com/gjtorikian/html-pipeline/issues/194)
20
+
21
+ **Merged pull requests:**
22
+
23
+ - Use emoji from commonmarker [\#373](https://github.com/gjtorikian/html-pipeline/pull/373) ([gjtorikian](https://github.com/gjtorikian))
24
+
3
25
  ## [v3.0.0.pre1](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre1) (2022-12-30)
4
26
 
5
27
  [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v2.14.3...v3.0.0.pre1)
data/Gemfile CHANGED
@@ -27,9 +27,10 @@ group :development do
27
27
  end
28
28
 
29
29
  group :test do
30
- gem "commonmarker", "~> 1.0.0.pre4", require: false
30
+ gem "commonmarker", "~> 1.0.0.pre7", require: false
31
31
  gem "gemoji", "~> 3.0", require: false
32
32
  gem "gemojione", "~> 4.3", require: false
33
+
33
34
  gem "minitest"
34
35
 
35
36
  gem "minitest-bisect", "~> 1.6"
@@ -37,4 +38,5 @@ group :test do
37
38
  gem "nokogiri", "~> 1.13"
38
39
 
39
40
  gem "minitest-focus", "~> 1.1"
41
+ gem "rouge", "~> 3.1", require: false
40
42
  end
data/README.md CHANGED
@@ -230,19 +230,28 @@ end
230
230
 
231
231
  For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.
232
232
 
233
- - `AbsoluteSourceFilter` - replace relative image urls with fully qualified versions
234
- - `EmojiFilter` - converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)!
235
- - `HttpsFilter` - Replacing http urls with https versions
236
- - `ImageMaxWidthFilter` - link to full size image for large images
237
- - `MentionFilter` - replace `@user` mentions with links
238
- - `SanitizationFilter` - allow sanitize user markup
239
- - `TableOfContentsFilter` - anchor headings with name attributes and generate Table of Contents html unordered list linking headings
240
- - `TeamMentionFilter` - replace `@org/team` mentions with links
233
+ - `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
234
+ - `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
235
+ - (Note: the included `MarkdownFilter` will already convert emoji)
236
+ - `HttpsFilter`: Replacing http urls with https versions
237
+ - `ImageMaxWidthFilter`: link to full size image for large images
238
+ - `MentionFilter`: replace `@user` mentions with links
239
+ - `SanitizationFilter`: allow sanitize user markup
240
+ - `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
241
+ - (Note: the included `MarkdownFilter` will already apply highlighting)
242
+ - `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
243
+ - `TeamMentionFilter`: replace `@org/team` mentions with links
241
244
 
242
245
  ## Dependencies
243
246
 
244
- Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
245
- dependencies yourself.
247
+ Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.
248
+
249
+ For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
250
+ to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:
251
+
252
+ ```ruby
253
+ gem "rouge"
254
+ ```
246
255
 
247
256
  > **Note**
248
257
  > See the [Gemfile](/Gemfile) `:test` group for any version requirements.
data/UPGRADING.md CHANGED
@@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
13
13
  The following filters were removed:
14
14
 
15
15
  - `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
16
- - `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
17
16
  - `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash
18
17
 
19
18
  - `EmailReplyFilter`
@@ -0,0 +1,62 @@
1
+ # frozen_string_literal: true
2
+
3
+ HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")
4
+
5
+ class HTMLPipeline
6
+ class NodeFilter
7
+ # HTML Filter that syntax highlights text inside code blocks.
8
+ #
9
+ # Context options:
10
+ #
11
+ # :highlight => String represents the language to pick lexer. Defaults to empty string.
12
+ # :scope => String represents the class attribute adds to pre element after.
13
+ # Defaults to "highlight highlight-css" if highlights a css code block.
14
+ #
15
+ # This filter does not write any additional information to the context hash.
16
+ class SyntaxHighlightFilter < NodeFilter
17
+ def initialize(context: {}, result: {})
18
+ super(context: context, result: result)
19
+ # TODO: test the optionality of this
20
+ @formatter = context[:formatter] || Rouge::Formatters::HTML.new
21
+ end
22
+
23
+ SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")
24
+
25
+ def selector
26
+ SELECTOR
27
+ end
28
+
29
+ def handle_element(element)
30
+ default = context[:highlight]&.to_s
31
+ @lang = element["lang"] || default
32
+
33
+ scope = context.fetch(:scope, "highlight")
34
+
35
+ element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
36
+ end
37
+
38
+ def handle_text_chunk(text)
39
+ return if @lang.nil?
40
+ return if (lexer = lexer_for(@lang)).nil?
41
+
42
+ content = text.to_s
43
+
44
+ text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
45
+ end
46
+
47
+ def highlight_with_timeout_handling(text, lexer)
48
+ Rouge.highlight(text, lexer, @formatter)
49
+ rescue Timeout::Error => _e
50
+ text
51
+ end
52
+
53
+ def lexer_for(lang)
54
+ Rouge::Lexer.find(lang)
55
+ end
56
+
57
+ def include_lang?
58
+ !@lang.nil? && !@lang.empty?
59
+ end
60
+ end
61
+ end
62
+ end
@@ -24,8 +24,10 @@ class HTMLPipeline
24
24
  # result[:output].to_s
25
25
  # # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
26
26
  class TableOfContentsFilter < NodeFilter
27
- SELECTOR = Selma::Selector.new(match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
28
- match_text_within: "h1, h2, h3, h4, h5, h6")
27
+ SELECTOR = Selma::Selector.new(
28
+ match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
29
+ match_text_within: "h1, h2, h3, h4, h5, h6",
30
+ )
29
31
 
30
32
  def selector
31
33
  SELECTOR
@@ -16,11 +16,70 @@ class HTMLPipeline
16
16
  # The main sanitization allowlist. Only these elements and attributes are
17
17
  # allowed through by default.
18
18
  DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
19
- elements: ["h1", "h2", "h3", "h4", "h5", "h6", "br", "b", "i", "strong", "em", "a", "pre", "code",
20
- "img", "tt", "div", "ins", "del", "sup", "sub", "p", "picture", "ol", "ul", "table", "thead", "tbody", "tfoot",
21
- "blockquote", "dl", "dt", "dd", "kbd", "q", "samp", "var", "hr", "ruby", "rt", "rp", "li", "tr", "td", "th",
22
- "s", "strike", "summary", "details", "caption", "figure", "figcaption", "abbr", "bdo", "cite",
23
- "dfn", "mark", "small", "source", "span", "time", "wbr",],
19
+ elements: [
20
+ "h1",
21
+ "h2",
22
+ "h3",
23
+ "h4",
24
+ "h5",
25
+ "h6",
26
+ "br",
27
+ "b",
28
+ "i",
29
+ "strong",
30
+ "em",
31
+ "a",
32
+ "pre",
33
+ "code",
34
+ "img",
35
+ "tt",
36
+ "div",
37
+ "ins",
38
+ "del",
39
+ "sup",
40
+ "sub",
41
+ "p",
42
+ "picture",
43
+ "ol",
44
+ "ul",
45
+ "table",
46
+ "thead",
47
+ "tbody",
48
+ "tfoot",
49
+ "blockquote",
50
+ "dl",
51
+ "dt",
52
+ "dd",
53
+ "kbd",
54
+ "q",
55
+ "samp",
56
+ "var",
57
+ "hr",
58
+ "ruby",
59
+ "rt",
60
+ "rp",
61
+ "li",
62
+ "tr",
63
+ "td",
64
+ "th",
65
+ "s",
66
+ "strike",
67
+ "summary",
68
+ "details",
69
+ "caption",
70
+ "figure",
71
+ "figcaption",
72
+ "abbr",
73
+ "bdo",
74
+ "cite",
75
+ "dfn",
76
+ "mark",
77
+ "small",
78
+ "source",
79
+ "span",
80
+ "time",
81
+ "wbr",
82
+ ],
24
83
 
25
84
  attributes: {
26
85
  "a" => ["href"],
@@ -31,13 +90,77 @@ class HTMLPipeline
31
90
  "ins" => ["cite"],
32
91
  "q" => ["cite"],
33
92
  "source" => ["srcset"],
34
- all: ["abbr", "accept", "accept-charset", "accesskey", "action", "align", "alt", "aria-describedby",
35
- "aria-hidden", "aria-label", "aria-labelledby", "axis", "border", "char",
36
- "charoff", "charset", "checked", "clear", "cols", "colspan", "compact", "coords", "datetime", "dir",
37
- "disabled", "enctype", "for", "frame", "headers", "height", "hreflang", "hspace", "id", "ismap", "label", "lang",
38
- "maxlength", "media", "method", "multiple", "name", "nohref", "noshade", "nowrap", "open", "progress",
39
- "prompt", "readonly", "rel", "rev", "role", "rows", "rowspan", "rules", "scope", "selected", "shape",
40
- "size", "span", "start", "summary", "tabindex", "title", "type", "usemap", "valign", "value", "width", "itemprop",],
93
+ all: [
94
+ "abbr",
95
+ "accept",
96
+ "accept-charset",
97
+ "accesskey",
98
+ "action",
99
+ "align",
100
+ "alt",
101
+ "aria-describedby",
102
+ "aria-hidden",
103
+ "aria-label",
104
+ "aria-labelledby",
105
+ "axis",
106
+ "border",
107
+ "char",
108
+ "charoff",
109
+ "charset",
110
+ "checked",
111
+ "clear",
112
+ "cols",
113
+ "colspan",
114
+ "compact",
115
+ "coords",
116
+ "datetime",
117
+ "dir",
118
+ "disabled",
119
+ "enctype",
120
+ "for",
121
+ "frame",
122
+ "headers",
123
+ "height",
124
+ "hreflang",
125
+ "hspace",
126
+ "id",
127
+ "ismap",
128
+ "label",
129
+ "lang",
130
+ "maxlength",
131
+ "media",
132
+ "method",
133
+ "multiple",
134
+ "name",
135
+ "nohref",
136
+ "noshade",
137
+ "nowrap",
138
+ "open",
139
+ "progress",
140
+ "prompt",
141
+ "readonly",
142
+ "rel",
143
+ "rev",
144
+ "role",
145
+ "rows",
146
+ "rowspan",
147
+ "rules",
148
+ "scope",
149
+ "selected",
150
+ "shape",
151
+ "size",
152
+ "span",
153
+ "start",
154
+ "summary",
155
+ "tabindex",
156
+ "title",
157
+ "type",
158
+ "usemap",
159
+ "valign",
160
+ "value",
161
+ "width",
162
+ "itemprop",
163
+ ],
41
164
  },
42
165
  protocols: {
43
166
  "a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class HTMLPipeline
4
- VERSION = "3.0.0.pre1"
4
+ VERSION = "3.0.0.pre2"
5
5
  end
data/lib/html_pipeline.rb CHANGED
@@ -145,8 +145,11 @@ end
145
145
  context = context.freeze
146
146
  result ||= {}
147
147
 
148
- payload = default_payload({ text_filters: @text_filters.map(&:name),
149
- context: context, result: result, })
148
+ payload = default_payload({
149
+ text_filters: @text_filters.map(&:name),
150
+ context: context,
151
+ result: result,
152
+ })
150
153
  instrument("call_text_filters.html_pipeline", payload) do
151
154
  result[:output] =
152
155
  @text_filters.inject(text) do |doc, filter|
@@ -159,8 +162,11 @@ end
159
162
  html = @convert_filter.call(text) unless @convert_filter.nil?
160
163
 
161
164
  unless @node_filters.empty?
162
- payload = default_payload({ node_filters: @node_filters.map { |f| f.class.name },
163
- context: context, result: result, })
165
+ payload = default_payload({
166
+ node_filters: @node_filters.map { |f| f.class.name },
167
+ context: context,
168
+ result: result,
169
+ })
164
170
  instrument("call_node_filters.html_pipeline", payload) do
165
171
  result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
166
172
  end
@@ -178,8 +184,11 @@ end
178
184
  #
179
185
  # Returns the result of the filter.
180
186
  def perform_filter(filter, doc, context: {}, result: {})
181
- payload = default_payload({ filter: filter.name,
182
- context: context, result: result, })
187
+ payload = default_payload({
188
+ filter: filter.name,
189
+ context: context,
190
+ result: result,
191
+ })
183
192
  instrument("call_filter.html_pipeline", payload) do
184
193
  filter.call(doc, context: context, result: result)
185
194
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-pipeline
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0.pre1
4
+ version: 3.0.0.pre2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-12-30 00:00:00.000000000 Z
11
+ date: 2023-01-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: selma
@@ -71,6 +71,7 @@ files:
71
71
  - lib/html_pipeline/node_filter/https_filter.rb
72
72
  - lib/html_pipeline/node_filter/image_max_width_filter.rb
73
73
  - lib/html_pipeline/node_filter/mention_filter.rb
74
+ - lib/html_pipeline/node_filter/syntax_highlight_filter.rb
74
75
  - lib/html_pipeline/node_filter/table_of_contents_filter.rb
75
76
  - lib/html_pipeline/node_filter/team_mention_filter.rb
76
77
  - lib/html_pipeline/sanitization_filter.rb