html-pipeline 3.0.0.pre1 → 3.0.0.pre3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4b8171e0970d8cbe50757505138bbea0f812c06583cfcdfd192686a8aff9cb53
4
- data.tar.gz: 19a40ef93871f3d2b6411c526d6375a21d494a301d7a4abb2e92b6062de81b1c
3
+ metadata.gz: 613a099c0290d87932d2bbaf78052d8c472518bb911d82fe651ad78a3cc32ee9
4
+ data.tar.gz: d03826dc83afb689e87395d3a365a24fd5b3608e3a29b632a79a5928acb84a2b
5
5
  SHA512:
6
- metadata.gz: 4c160dcf8463cc5009d397529087c9f035cb3ae5e998c12ace03810fd8146b031f5f28829e318456239cd410d41fbf3d1da7d0b0e8097ceb3206733a96484c5f
7
- data.tar.gz: 72fa1ec6b3a2097dd0ce6108f55a10ac9b617e5fc607e67818876c5b8ececbc7c2bbd58e53ce530ba39f11f6ad0c292ec956e9e42fbed370213abc5b71bcf257
6
+ metadata.gz: 35ce289f0ca10929fba016a5dd0d7c0a6bf0d8f8d4390385590c0937190df67605b96b474a477a54feb5192a54fc5b299940d264483700a9c5deac04cff1b9e1
7
+ data.tar.gz: dbddce89afcb5f0291d28c9b24b4a2e4f25ea847d8960dc06a11f0c6cfd6cf2feb169f406fd69ca5f25a3eabc4338ce0cf7f2b94fa2147c682135bb6dee70728
data/CHANGELOG.md CHANGED
@@ -1,5 +1,43 @@
1
1
  # Changelog
2
2
 
3
+ ## [Unreleased](https://github.com/gjtorikian/html-pipeline/tree/HEAD)
4
+
5
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre3...HEAD)
6
+
7
+ **Closed issues:**
8
+
9
+ - v3: Question regarding requiring a ConvertFilter if there are NodeFilters [\#374](https://github.com/gjtorikian/html-pipeline/issues/374)
10
+
11
+ ## [v3.0.0.pre3](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre3) (2023-02-15)
12
+
13
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre2...v3.0.0.pre3)
14
+
15
+ **Merged pull requests:**
16
+
17
+ - req convert\_filter if `text/node`filter present [\#375](https://github.com/gjtorikian/html-pipeline/pull/375) ([gjtorikian](https://github.com/gjtorikian))
18
+
19
+ ## [v3.0.0.pre2](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre2) (2023-01-26)
20
+
21
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre1...v3.0.0.pre2)
22
+
23
+ **Closed issues:**
24
+
25
+ - Indicate a version for activesupport that has support/receives security patches \(\>= 6?\) [\#367](https://github.com/gjtorikian/html-pipeline/issues/367)
26
+ - Add MathML elements to whitelist [\#336](https://github.com/gjtorikian/html-pipeline/issues/336)
27
+ - Feature request: add safe semantic HTML tags to default whitelist [\#312](https://github.com/gjtorikian/html-pipeline/issues/312)
28
+ - Allow float: left|right and clear: left|right|both in sanitation [\#302](https://github.com/gjtorikian/html-pipeline/issues/302)
29
+ - Open link in new tab option [\#266](https://github.com/gjtorikian/html-pipeline/issues/266)
30
+ - Consider allowing LINK and META elements in HTML [\#261](https://github.com/gjtorikian/html-pipeline/issues/261)
31
+ - Allow SVG elements in whitelist [\#251](https://github.com/gjtorikian/html-pipeline/issues/251)
32
+ - Allow RDFa 1.1 \(Lite\) attributes [\#249](https://github.com/gjtorikian/html-pipeline/issues/249)
33
+ - What’s the point of allowing the accept-charset attribute in the sanitization filter ? [\#218](https://github.com/gjtorikian/html-pipeline/issues/218)
34
+ - Use gemojione instead of gemoji [\#200](https://github.com/gjtorikian/html-pipeline/issues/200)
35
+ - Link schema not used with \<a\> markup : print them directly [\#194](https://github.com/gjtorikian/html-pipeline/issues/194)
36
+
37
+ **Merged pull requests:**
38
+
39
+ - Use emoji from commonmarker [\#373](https://github.com/gjtorikian/html-pipeline/pull/373) ([gjtorikian](https://github.com/gjtorikian))
40
+
3
41
  ## [v3.0.0.pre1](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre1) (2022-12-30)
4
42
 
5
43
  [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v2.14.3...v3.0.0.pre1)
data/Gemfile CHANGED
@@ -27,9 +27,10 @@ group :development do
27
27
  end
28
28
 
29
29
  group :test do
30
- gem "commonmarker", "~> 1.0.0.pre4", require: false
30
+ gem "commonmarker", "~> 1.0.0.pre7", require: false
31
31
  gem "gemoji", "~> 3.0", require: false
32
32
  gem "gemojione", "~> 4.3", require: false
33
+
33
34
  gem "minitest"
34
35
 
35
36
  gem "minitest-bisect", "~> 1.6"
@@ -37,4 +38,5 @@ group :test do
37
38
  gem "nokogiri", "~> 1.13"
38
39
 
39
40
  gem "minitest-focus", "~> 1.1"
41
+ gem "rouge", "~> 3.1", require: false
40
42
  end
data/README.md CHANGED
@@ -230,19 +230,28 @@ end
230
230
 
231
231
  For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.
232
232
 
233
- - `AbsoluteSourceFilter` - replace relative image urls with fully qualified versions
234
- - `EmojiFilter` - converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)!
235
- - `HttpsFilter` - Replacing http urls with https versions
236
- - `ImageMaxWidthFilter` - link to full size image for large images
237
- - `MentionFilter` - replace `@user` mentions with links
238
- - `SanitizationFilter` - allow sanitize user markup
239
- - `TableOfContentsFilter` - anchor headings with name attributes and generate Table of Contents html unordered list linking headings
240
- - `TeamMentionFilter` - replace `@org/team` mentions with links
233
+ - `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
234
+ - `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
235
+ - (Note: the included `MarkdownFilter` will already convert emoji)
236
+ - `HttpsFilter`: Replacing http urls with https versions
237
+ - `ImageMaxWidthFilter`: link to full size image for large images
238
+ - `MentionFilter`: replace `@user` mentions with links
239
+ - `SanitizationFilter`: allow sanitize user markup
240
+ - `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
241
+ - (Note: the included `MarkdownFilter` will already apply highlighting)
242
+ - `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
243
+ - `TeamMentionFilter`: replace `@org/team` mentions with links
241
244
 
242
245
  ## Dependencies
243
246
 
244
- Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
245
- dependencies yourself.
247
+ Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.
248
+
249
+ For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
250
+ to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:
251
+
252
+ ```ruby
253
+ gem "rouge"
254
+ ```
246
255
 
247
256
  > **Note**
248
257
  > See the [Gemfile](/Gemfile) `:test` group for any version requirements.
data/UPGRADING.md CHANGED
@@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
13
13
  The following filters were removed:
14
14
 
15
15
  - `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
16
- - `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
17
16
  - `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash
18
17
 
19
18
  - `EmailReplyFilter`
@@ -80,7 +80,7 @@ class HTMLPipeline
80
80
  def needs(*keys)
81
81
  missing = keys.reject { |key| context.include?(key) }
82
82
 
83
- return unless missing.any?
83
+ return if missing.none?
84
84
 
85
85
  raise ArgumentError,
86
86
  "Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join(", ")}"
@@ -0,0 +1,62 @@
1
+ # frozen_string_literal: true
2
+
3
+ HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")
4
+
5
+ class HTMLPipeline
6
+ class NodeFilter
7
+ # HTML Filter that syntax highlights text inside code blocks.
8
+ #
9
+ # Context options:
10
+ #
11
+ # :highlight => String represents the language to pick lexer. Defaults to empty string.
12
+ # :scope => String represents the class attribute adds to pre element after.
13
+ # Defaults to "highlight highlight-css" if highlights a css code block.
14
+ #
15
+ # This filter does not write any additional information to the context hash.
16
+ class SyntaxHighlightFilter < NodeFilter
17
+ def initialize(context: {}, result: {})
18
+ super(context: context, result: result)
19
+ # TODO: test the optionality of this
20
+ @formatter = context[:formatter] || Rouge::Formatters::HTML.new
21
+ end
22
+
23
+ SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")
24
+
25
+ def selector
26
+ SELECTOR
27
+ end
28
+
29
+ def handle_element(element)
30
+ default = context[:highlight]&.to_s
31
+ @lang = element["lang"] || default
32
+
33
+ scope = context.fetch(:scope, "highlight")
34
+
35
+ element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
36
+ end
37
+
38
+ def handle_text_chunk(text)
39
+ return if @lang.nil?
40
+ return if (lexer = lexer_for(@lang)).nil?
41
+
42
+ content = text.to_s
43
+
44
+ text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
45
+ end
46
+
47
+ def highlight_with_timeout_handling(text, lexer)
48
+ Rouge.highlight(text, lexer, @formatter)
49
+ rescue Timeout::Error => _e
50
+ text
51
+ end
52
+
53
+ def lexer_for(lang)
54
+ Rouge::Lexer.find(lang)
55
+ end
56
+
57
+ def include_lang?
58
+ !@lang.nil? && !@lang.empty?
59
+ end
60
+ end
61
+ end
62
+ end
@@ -24,8 +24,10 @@ class HTMLPipeline
24
24
  # result[:output].to_s
25
25
  # # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
26
26
  class TableOfContentsFilter < NodeFilter
27
- SELECTOR = Selma::Selector.new(match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
28
- match_text_within: "h1, h2, h3, h4, h5, h6")
27
+ SELECTOR = Selma::Selector.new(
28
+ match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
29
+ match_text_within: "h1, h2, h3, h4, h5, h6",
30
+ )
29
31
 
30
32
  def selector
31
33
  SELECTOR
@@ -16,11 +16,70 @@ class HTMLPipeline
16
16
  # The main sanitization allowlist. Only these elements and attributes are
17
17
  # allowed through by default.
18
18
  DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
19
- elements: ["h1", "h2", "h3", "h4", "h5", "h6", "br", "b", "i", "strong", "em", "a", "pre", "code",
20
- "img", "tt", "div", "ins", "del", "sup", "sub", "p", "picture", "ol", "ul", "table", "thead", "tbody", "tfoot",
21
- "blockquote", "dl", "dt", "dd", "kbd", "q", "samp", "var", "hr", "ruby", "rt", "rp", "li", "tr", "td", "th",
22
- "s", "strike", "summary", "details", "caption", "figure", "figcaption", "abbr", "bdo", "cite",
23
- "dfn", "mark", "small", "source", "span", "time", "wbr",],
19
+ elements: [
20
+ "h1",
21
+ "h2",
22
+ "h3",
23
+ "h4",
24
+ "h5",
25
+ "h6",
26
+ "br",
27
+ "b",
28
+ "i",
29
+ "strong",
30
+ "em",
31
+ "a",
32
+ "pre",
33
+ "code",
34
+ "img",
35
+ "tt",
36
+ "div",
37
+ "ins",
38
+ "del",
39
+ "sup",
40
+ "sub",
41
+ "p",
42
+ "picture",
43
+ "ol",
44
+ "ul",
45
+ "table",
46
+ "thead",
47
+ "tbody",
48
+ "tfoot",
49
+ "blockquote",
50
+ "dl",
51
+ "dt",
52
+ "dd",
53
+ "kbd",
54
+ "q",
55
+ "samp",
56
+ "var",
57
+ "hr",
58
+ "ruby",
59
+ "rt",
60
+ "rp",
61
+ "li",
62
+ "tr",
63
+ "td",
64
+ "th",
65
+ "s",
66
+ "strike",
67
+ "summary",
68
+ "details",
69
+ "caption",
70
+ "figure",
71
+ "figcaption",
72
+ "abbr",
73
+ "bdo",
74
+ "cite",
75
+ "dfn",
76
+ "mark",
77
+ "small",
78
+ "source",
79
+ "span",
80
+ "time",
81
+ "wbr",
82
+ ],
24
83
 
25
84
  attributes: {
26
85
  "a" => ["href"],
@@ -31,13 +90,77 @@ class HTMLPipeline
31
90
  "ins" => ["cite"],
32
91
  "q" => ["cite"],
33
92
  "source" => ["srcset"],
34
- all: ["abbr", "accept", "accept-charset", "accesskey", "action", "align", "alt", "aria-describedby",
35
- "aria-hidden", "aria-label", "aria-labelledby", "axis", "border", "char",
36
- "charoff", "charset", "checked", "clear", "cols", "colspan", "compact", "coords", "datetime", "dir",
37
- "disabled", "enctype", "for", "frame", "headers", "height", "hreflang", "hspace", "id", "ismap", "label", "lang",
38
- "maxlength", "media", "method", "multiple", "name", "nohref", "noshade", "nowrap", "open", "progress",
39
- "prompt", "readonly", "rel", "rev", "role", "rows", "rowspan", "rules", "scope", "selected", "shape",
40
- "size", "span", "start", "summary", "tabindex", "title", "type", "usemap", "valign", "value", "width", "itemprop",],
93
+ all: [
94
+ "abbr",
95
+ "accept",
96
+ "accept-charset",
97
+ "accesskey",
98
+ "action",
99
+ "align",
100
+ "alt",
101
+ "aria-describedby",
102
+ "aria-hidden",
103
+ "aria-label",
104
+ "aria-labelledby",
105
+ "axis",
106
+ "border",
107
+ "char",
108
+ "charoff",
109
+ "charset",
110
+ "checked",
111
+ "clear",
112
+ "cols",
113
+ "colspan",
114
+ "compact",
115
+ "coords",
116
+ "datetime",
117
+ "dir",
118
+ "disabled",
119
+ "enctype",
120
+ "for",
121
+ "frame",
122
+ "headers",
123
+ "height",
124
+ "hreflang",
125
+ "hspace",
126
+ "id",
127
+ "ismap",
128
+ "label",
129
+ "lang",
130
+ "maxlength",
131
+ "media",
132
+ "method",
133
+ "multiple",
134
+ "name",
135
+ "nohref",
136
+ "noshade",
137
+ "nowrap",
138
+ "open",
139
+ "progress",
140
+ "prompt",
141
+ "readonly",
142
+ "rel",
143
+ "rev",
144
+ "role",
145
+ "rows",
146
+ "rowspan",
147
+ "rules",
148
+ "scope",
149
+ "selected",
150
+ "shape",
151
+ "size",
152
+ "span",
153
+ "start",
154
+ "summary",
155
+ "tabindex",
156
+ "title",
157
+ "type",
158
+ "usemap",
159
+ "valign",
160
+ "value",
161
+ "width",
162
+ "itemprop",
163
+ ],
41
164
  },
42
165
  protocols: {
43
166
  "a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class HTMLPipeline
4
- VERSION = "3.0.0.pre1"
4
+ VERSION = "3.0.0.pre3"
5
5
  end
data/lib/html_pipeline.rb CHANGED
@@ -116,8 +116,9 @@ end
116
116
  validate_filters(@node_filters, HTMLPipeline::NodeFilter)
117
117
 
118
118
  @convert_filter = convert_filter
119
- if @convert_filter.nil? && !@node_filters.empty?
120
- raise InvalidFilterError, "Must provide `convert_filter` if `node_filter`s is also provided"
119
+
120
+ if @convert_filter.nil? && (!@text_filters.empty? && !@node_filters.empty?)
121
+ raise InvalidFilterError, "Must provide `convert_filter` if `text_filters` and `node_filters` are also provided"
121
122
  elsif !@convert_filter.nil?
122
123
  validate_filter(@convert_filter, HTMLPipeline::ConvertFilter)
123
124
  end
@@ -145,8 +146,11 @@ end
145
146
  context = context.freeze
146
147
  result ||= {}
147
148
 
148
- payload = default_payload({ text_filters: @text_filters.map(&:name),
149
- context: context, result: result, })
149
+ payload = default_payload({
150
+ text_filters: @text_filters.map(&:name),
151
+ context: context,
152
+ result: result,
153
+ })
150
154
  instrument("call_text_filters.html_pipeline", payload) do
151
155
  result[:output] =
152
156
  @text_filters.inject(text) do |doc, filter|
@@ -159,8 +163,11 @@ end
159
163
  html = @convert_filter.call(text) unless @convert_filter.nil?
160
164
 
161
165
  unless @node_filters.empty?
162
- payload = default_payload({ node_filters: @node_filters.map { |f| f.class.name },
163
- context: context, result: result, })
166
+ payload = default_payload({
167
+ node_filters: @node_filters.map { |f| f.class.name },
168
+ context: context,
169
+ result: result,
170
+ })
164
171
  instrument("call_node_filters.html_pipeline", payload) do
165
172
  result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
166
173
  end
@@ -178,8 +185,11 @@ end
178
185
  #
179
186
  # Returns the result of the filter.
180
187
  def perform_filter(filter, doc, context: {}, result: {})
181
- payload = default_payload({ filter: filter.name,
182
- context: context, result: result, })
188
+ payload = default_payload({
189
+ filter: filter.name,
190
+ context: context,
191
+ result: result,
192
+ })
183
193
  instrument("call_filter.html_pipeline", payload) do
184
194
  filter.call(doc, context: context, result: result)
185
195
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-pipeline
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0.pre1
4
+ version: 3.0.0.pre3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-12-30 00:00:00.000000000 Z
11
+ date: 2023-02-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: selma
@@ -71,6 +71,7 @@ files:
71
71
  - lib/html_pipeline/node_filter/https_filter.rb
72
72
  - lib/html_pipeline/node_filter/image_max_width_filter.rb
73
73
  - lib/html_pipeline/node_filter/mention_filter.rb
74
+ - lib/html_pipeline/node_filter/syntax_highlight_filter.rb
74
75
  - lib/html_pipeline/node_filter/table_of_contents_filter.rb
75
76
  - lib/html_pipeline/node_filter/team_mention_filter.rb
76
77
  - lib/html_pipeline/sanitization_filter.rb