html-pipeline 0.0.4 → 0.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGELOG.md +12 -0
- data/README.md +100 -7
- data/html-pipeline.gemspec +10 -9
- data/lib/html/pipeline.rb +3 -3
- data/lib/html/pipeline/camo_filter.rb +10 -4
- data/lib/html/pipeline/emoji_filter.rb +8 -2
- data/lib/html/pipeline/filter.rb +21 -1
- data/lib/html/pipeline/image_max_width_filter.rb +2 -2
- data/lib/html/pipeline/sanitization_filter.rb +6 -8
- data/lib/html/pipeline/version.rb +1 -1
- data/test/html/pipeline/camo_filter_test.rb +8 -0
- data/test/html/pipeline/emoji_filter_test.rb +9 -7
- data/test/html/pipeline/sanitization_filter_test.rb +1 -1
- metadata +7 -5
data/CHANGELOG.md
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
# CHANGELOG
|
2
|
+
|
3
|
+
## 0.0.5 (unreleased)
|
4
|
+
|
5
|
+
* fix li xss vulnerability in sanitization filter: vmg #31
|
6
|
+
* gemspec cleanup: nbibler #23, jbarnette #24
|
7
|
+
* doc updates: jch #16, pborreli #17, wickedshimmy #18, benubois #19, blackerby #21
|
8
|
+
* loosen gemoji dependency: josh #15
|
9
|
+
|
10
|
+
## 0.0.4
|
11
|
+
|
12
|
+
* initial public release
|
data/README.md
CHANGED
@@ -53,7 +53,9 @@ pipeline = HTML::Pipeline.new [
|
|
53
53
|
result = pipeline.call <<CODE
|
54
54
|
This is *great*:
|
55
55
|
|
56
|
-
|
56
|
+
``` ruby
|
57
|
+
some_code(:first)
|
58
|
+
```
|
57
59
|
|
58
60
|
CODE
|
59
61
|
result[:output].to_s
|
@@ -83,19 +85,108 @@ filter.call
|
|
83
85
|
## Filters
|
84
86
|
|
85
87
|
* `MentionFilter` - replace `@user` mentions with links
|
86
|
-
* `
|
87
|
-
* `CamoFilter` - replace http image urls with [camo-fied](https://github.com/
|
88
|
+
* `AutolinkFilter` - auto_linking urls in HTML
|
89
|
+
* `CamoFilter` - replace http image urls with [camo-fied](https://github.com/atmos/camo) https versions
|
88
90
|
* `EmailReplyFilter` - util filter for working with emails
|
89
91
|
* `EmojiFilter` - everyone loves [emoji](http://www.emoji-cheat-sheet.com/)!
|
92
|
+
* `HttpsFilter` - HTML Filter for replacing http github urls with https versions.
|
90
93
|
* `ImageMaxWidthFilter` - link to full size image for large images
|
91
94
|
* `MarkdownFilter` - convert markdown to html
|
92
95
|
* `PlainTextInputFilter` - html escape text and wrap the result in a div
|
93
|
-
* `SanitizationFilter` - whitelist
|
96
|
+
* `SanitizationFilter` - whitelist sanitize user markup
|
94
97
|
* `SyntaxHighlightFilter` - code syntax highlighter with [linguist](https://github.com/github/linguist)
|
95
98
|
* `TextileFilter` - convert textile to html
|
96
99
|
* `TableOfContentsFilter` - anchor headings with name attributes
|
97
100
|
|
98
|
-
##
|
101
|
+
## Examples
|
102
|
+
|
103
|
+
We define different pipelines for different parts of our app. Here are a few
|
104
|
+
paraphrased snippets to get you started:
|
105
|
+
|
106
|
+
```ruby
|
107
|
+
# The context hash is how you pass options between different filters.
|
108
|
+
# See individual filter source for explanation of options.
|
109
|
+
context = {
|
110
|
+
:asset_root => "http://your-domain.com/where/your/images/live/icons",
|
111
|
+
:base_url => "http://your-domain.com"
|
112
|
+
}
|
113
|
+
|
114
|
+
# Pipeline providing sanitization and image hijacking but no mention
|
115
|
+
# related features.
|
116
|
+
SimplePipeline = Pipeline.new [
|
117
|
+
SanitizationFilter,
|
118
|
+
TableOfContentsFilter, # add 'name' anchors to all headers
|
119
|
+
CamoFilter,
|
120
|
+
ImageMaxWidthFilter,
|
121
|
+
SyntaxHighlightFilter,
|
122
|
+
EmojiFilter,
|
123
|
+
AutolinkFilter
|
124
|
+
], context, {}
|
125
|
+
|
126
|
+
# Pipeline used for user provided content on the web
|
127
|
+
MarkdownPipeline = Pipeline.new [
|
128
|
+
MarkdownFilter,
|
129
|
+
SanitizationFilter,
|
130
|
+
CamoFilter,
|
131
|
+
ImageMaxWidthFilter,
|
132
|
+
HttpsFilter,
|
133
|
+
MentionFilter,
|
134
|
+
EmojiFilter,
|
135
|
+
SyntaxHighlightFilter
|
136
|
+
], context.merge(:gfm => true), {} # enable github formatted markdown
|
137
|
+
|
138
|
+
|
139
|
+
# Define a pipeline based on another pipeline's filters
|
140
|
+
NonGFMMarkdownPipeline = Pipeline.new(MarkdownPipeline.filters,
|
141
|
+
context.merge(:gfm => false), {})
|
142
|
+
|
143
|
+
# Pipelines aren't limited to the web. You can use them for email
|
144
|
+
# processing also.
|
145
|
+
HtmlEmailPipeline = Pipeline.new [
|
146
|
+
ImageMaxWidthFilter
|
147
|
+
], {}, {}
|
148
|
+
|
149
|
+
# Just emoji.
|
150
|
+
EmojiPipeline = Pipeline.new [
|
151
|
+
HTMLInputFilter,
|
152
|
+
EmojiFilter
|
153
|
+
], context, {}
|
154
|
+
```
|
155
|
+
|
156
|
+
## Extending
|
157
|
+
To write a custom filter, you need a class with a `call` method that inherits
|
158
|
+
from `HTML::Pipeline::Filter`.
|
159
|
+
|
160
|
+
For example this filter adds a base url to images that are root relative:
|
161
|
+
|
162
|
+
```ruby
|
163
|
+
require 'uri'
|
164
|
+
|
165
|
+
class RootRelativeFilter < HTML::Pipeline::Filter
|
166
|
+
|
167
|
+
def call
|
168
|
+
doc.search("img").each do |img|
|
169
|
+
next if img['src'].nil?
|
170
|
+
src = img['src'].strip
|
171
|
+
if src.start_with? '/'
|
172
|
+
img["src"] = URI.join(context[:base_url], src).to_s
|
173
|
+
end
|
174
|
+
end
|
175
|
+
doc
|
176
|
+
end
|
177
|
+
|
178
|
+
end
|
179
|
+
```
|
180
|
+
|
181
|
+
Now this filter can be used in a pipeline:
|
182
|
+
|
183
|
+
```ruby
|
184
|
+
Pipeline.new [ RootRelativeFilter ], { :base_url => 'http://somehost.com' }
|
185
|
+
```
|
186
|
+
|
187
|
+
## Development
|
188
|
+
|
189
|
+
To see what has changed in recent versions, see the [CHANGELOG](https://github.com/jch/html-pipeline/blob/master/CHANGELOG.md).
|
99
190
|
|
100
191
|
```sh
|
101
192
|
bundle
|
@@ -104,11 +195,11 @@ rake test
|
|
104
195
|
|
105
196
|
## Contributing
|
106
197
|
|
107
|
-
1. Fork it
|
198
|
+
1. [Fork it](https://help.github.com/articles/fork-a-repo)
|
108
199
|
2. Create your feature branch (`git checkout -b my-new-feature`)
|
109
200
|
3. Commit your changes (`git commit -am 'Added some feature'`)
|
110
201
|
4. Push to the branch (`git push origin my-new-feature`)
|
111
|
-
5. Create new Pull Request
|
202
|
+
5. Create new [Pull Request](https://help.github.com/articles/using-pull-requests)
|
112
203
|
|
113
204
|
|
114
205
|
## TODO
|
@@ -126,3 +217,5 @@ rake test
|
|
126
217
|
* [Simon Rozet](mailto:simon@rozet.name)
|
127
218
|
* [Vicent Martí](mailto:tanoku@gmail.com)
|
128
219
|
* [Risk :danger: Olson](mailto:technoweenie@gmail.com)
|
220
|
+
|
221
|
+
Project is a member of the [OSS Manifesto](http://ossmanifesto.org/).
|
data/html-pipeline.gemspec
CHANGED
@@ -1,9 +1,10 @@
|
|
1
1
|
# -*- encoding: utf-8 -*-
|
2
|
-
require File.expand_path(
|
2
|
+
require File.expand_path("../lib/html/pipeline/version", __FILE__)
|
3
3
|
|
4
4
|
Gem::Specification.new do |gem|
|
5
5
|
gem.name = "html-pipeline"
|
6
6
|
gem.version = HTML::Pipeline::VERSION
|
7
|
+
gem.license = "MIT"
|
7
8
|
gem.authors = ["Ryan Tomayko", "Jerry Cheung"]
|
8
9
|
gem.email = ["ryan@github.com", "jerry@github.com"]
|
9
10
|
gem.description = %q{GitHub HTML processing filters and utilities}
|
@@ -14,12 +15,12 @@ Gem::Specification.new do |gem|
|
|
14
15
|
gem.test_files = gem.files.grep(%r{^test})
|
15
16
|
gem.require_paths = ["lib"]
|
16
17
|
|
17
|
-
gem.add_dependency
|
18
|
-
gem.add_dependency
|
19
|
-
gem.add_dependency
|
20
|
-
gem.add_dependency
|
21
|
-
gem.add_dependency
|
22
|
-
gem.add_dependency
|
23
|
-
gem.add_dependency
|
24
|
-
gem.add_dependency
|
18
|
+
gem.add_dependency "gemoji", "~> 1.0"
|
19
|
+
gem.add_dependency "nokogiri", "~> 1.4"
|
20
|
+
gem.add_dependency "github-markdown", "~> 0.5"
|
21
|
+
gem.add_dependency "sanitize", "~> 2.0"
|
22
|
+
gem.add_dependency "github-linguist", "~> 2.1"
|
23
|
+
gem.add_dependency "rinku", "~> 1.7"
|
24
|
+
gem.add_dependency "escape_utils", "~> 0.2"
|
25
|
+
gem.add_dependency "activesupport", ">= 2"
|
25
26
|
end
|
data/lib/html/pipeline.rb
CHANGED
@@ -9,8 +9,8 @@ module HTML
|
|
9
9
|
#
|
10
10
|
# See HTML::Pipeline::Filter for information on building filters.
|
11
11
|
#
|
12
|
-
#
|
13
|
-
# with one to many filters, and
|
12
|
+
# Construct a Pipeline for running multiple HTML filters. A pipeline is created once
|
13
|
+
# with one to many filters, and it then can be `call`ed many times over the course
|
14
14
|
# of its lifetime with input.
|
15
15
|
#
|
16
16
|
# filters - Array of Filter objects. Each must respond to call(doc,
|
@@ -22,7 +22,7 @@ module HTML
|
|
22
22
|
# nil. Default: empty Hash.
|
23
23
|
# result_class - The default Class of the result object for individual
|
24
24
|
# calls. Default: Hash. Protip: Pass in a Struct to get
|
25
|
-
# some
|
25
|
+
# some semblance of type safety.
|
26
26
|
class Pipeline
|
27
27
|
autoload :VERSION, 'html/pipeline/version'
|
28
28
|
autoload :Pipeline, 'html/pipeline/pipeline'
|
@@ -11,8 +11,8 @@ module HTML
|
|
11
11
|
# in browser clients.
|
12
12
|
#
|
13
13
|
# Context options:
|
14
|
-
# :asset_proxy - Base URL for constructed asset proxy URLs.
|
15
|
-
# :asset_proxy_secret_key - The shared secret used to encode URLs.
|
14
|
+
# :asset_proxy (required) - Base URL for constructed asset proxy URLs.
|
15
|
+
# :asset_proxy_secret_key (required) - The shared secret used to encode URLs.
|
16
16
|
#
|
17
17
|
# This filter does not write additional information to the context.
|
18
18
|
class CamoFilter < Filter
|
@@ -33,6 +33,12 @@ module HTML
|
|
33
33
|
end
|
34
34
|
doc
|
35
35
|
end
|
36
|
+
|
37
|
+
# Implementation of validate hook.
|
38
|
+
# Errors should raise exceptions or use an existing validator.
|
39
|
+
def validate
|
40
|
+
needs :asset_proxy, :asset_proxy_secret_key
|
41
|
+
end
|
36
42
|
|
37
43
|
# The camouflaged URL for a given image URL.
|
38
44
|
def asset_proxy_url(url)
|
@@ -47,11 +53,11 @@ module HTML
|
|
47
53
|
|
48
54
|
# Private: the hostname to use for generated asset proxied URLs.
|
49
55
|
def asset_proxy_host
|
50
|
-
context[:asset_proxy]
|
56
|
+
context[:asset_proxy]
|
51
57
|
end
|
52
58
|
|
53
59
|
def asset_proxy_secret_key
|
54
|
-
context[:asset_proxy_secret_key]
|
60
|
+
context[:asset_proxy_secret_key]
|
55
61
|
end
|
56
62
|
|
57
63
|
# Private: helper to hexencode a string. Each byte ends up encoded into
|
@@ -5,7 +5,7 @@ module HTML
|
|
5
5
|
# HTML filter that replaces :emoji: with images.
|
6
6
|
#
|
7
7
|
# Context:
|
8
|
-
# :asset_root - base url to link to emoji sprite
|
8
|
+
# :asset_root (required) - base url to link to emoji sprite
|
9
9
|
class EmojiFilter < Filter
|
10
10
|
# Build a regexp that matches all valid :emoji: names.
|
11
11
|
EmojiPattern = /:(#{Emoji.names.map { |name| Regexp.escape(name) }.join('|')}):/
|
@@ -21,6 +21,12 @@ module HTML
|
|
21
21
|
end
|
22
22
|
doc
|
23
23
|
end
|
24
|
+
|
25
|
+
# Implementation of validate hook.
|
26
|
+
# Errors should raise exceptions or use an existing validator.
|
27
|
+
def validate
|
28
|
+
needs :asset_root
|
29
|
+
end
|
24
30
|
|
25
31
|
# Replace :emoji: with corresponding images.
|
26
32
|
#
|
@@ -41,7 +47,7 @@ module HTML
|
|
41
47
|
# Raises ArgumentError if context option has not been provided.
|
42
48
|
# Returns the context's asset_root.
|
43
49
|
def asset_root
|
44
|
-
context[:asset_root]
|
50
|
+
context[:asset_root]
|
45
51
|
end
|
46
52
|
end
|
47
53
|
end
|
data/lib/html/pipeline/filter.rb
CHANGED
@@ -39,8 +39,9 @@ module HTML
|
|
39
39
|
end
|
40
40
|
@context = context || {}
|
41
41
|
@result = result || {}
|
42
|
+
validate
|
42
43
|
end
|
43
|
-
|
44
|
+
|
44
45
|
# Public: Returns a simple Hash used to pass extra information into filters
|
45
46
|
# and also to allow filters to make extracted information available to the
|
46
47
|
# caller.
|
@@ -73,6 +74,10 @@ module HTML
|
|
73
74
|
def call
|
74
75
|
raise NotImplementedError
|
75
76
|
end
|
77
|
+
|
78
|
+
# Make sure the context has everything we need. Noop: Subclasses can override.
|
79
|
+
def validate
|
80
|
+
end
|
76
81
|
|
77
82
|
# The Repository object provided in the context hash, or nil when no
|
78
83
|
# :repository was specified.
|
@@ -153,6 +158,21 @@ module HTML
|
|
153
158
|
output.to_s
|
154
159
|
end
|
155
160
|
end
|
161
|
+
|
162
|
+
# Validator for required context. This will check that anything passed in
|
163
|
+
# contexts exists in @contexts
|
164
|
+
#
|
165
|
+
# If any errors are found an ArgumentError will be raised with a
|
166
|
+
# message listing all the missing contexts and the filters that
|
167
|
+
# require them.
|
168
|
+
def needs(*keys)
|
169
|
+
missing = keys.reject { |key| context.include? key }
|
170
|
+
|
171
|
+
if missing.any?
|
172
|
+
raise ArgumentError,
|
173
|
+
"Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join ', '}"
|
174
|
+
end
|
175
|
+
end
|
156
176
|
end
|
157
177
|
end
|
158
178
|
end
|
@@ -9,11 +9,11 @@ module HTML
|
|
9
9
|
class ImageMaxWidthFilter < Filter
|
10
10
|
def call
|
11
11
|
doc.search('img').each do |element|
|
12
|
-
# Skip if
|
12
|
+
# Skip if there's already a style attribute. Not sure how this
|
13
13
|
# would happen but we can reconsider it in the future.
|
14
14
|
next if element['style']
|
15
15
|
|
16
|
-
# Bail out if src doesn't look like a valid http url.
|
16
|
+
# Bail out if src doesn't look like a valid http url. trying to avoid weird
|
17
17
|
# js injection via javascript: urls.
|
18
18
|
next if element['src'].to_s.strip =~ /\Ajavascript/i
|
19
19
|
|
@@ -33,7 +33,7 @@ module HTML
|
|
33
33
|
:elements => %w(
|
34
34
|
h1 h2 h3 h4 h5 h6 h7 h8 br b i strong em a pre code img tt
|
35
35
|
div ins del sup sub p ol ul table blockquote dl dt dd
|
36
|
-
kbd q samp var hr ruby rt rp
|
36
|
+
kbd q samp var hr ruby rt rp li tr td th
|
37
37
|
),
|
38
38
|
:attributes => {
|
39
39
|
'a' => ['href'],
|
@@ -62,22 +62,20 @@ module HTML
|
|
62
62
|
'img' => {'src' => ['http', 'https', :relative]}
|
63
63
|
},
|
64
64
|
:transformers => [
|
65
|
-
#
|
66
|
-
# top-level <li> elements are removed because they can break out of
|
65
|
+
# Top-level <li> elements are removed because they can break out of
|
67
66
|
# containing markup.
|
68
67
|
lambda { |env|
|
69
68
|
name, node = env[:node_name], env[:node]
|
70
|
-
if name == LIST_ITEM && node.ancestors.any?{ |n| LISTS.include?(n.name) }
|
71
|
-
|
69
|
+
if name == LIST_ITEM && !node.ancestors.any?{ |n| LISTS.include?(n.name) }
|
70
|
+
node.replace(node.children)
|
72
71
|
end
|
73
72
|
},
|
74
73
|
|
75
|
-
# Whitelist only table child elements that are descended from a <table>.
|
76
74
|
# Table child elements that are not contained by a <table> are removed.
|
77
75
|
lambda { |env|
|
78
76
|
name, node = env[:node_name], env[:node]
|
79
|
-
if TABLE_ITEMS.include?(name) && node.ancestors.any? { |n| n.name == TABLE }
|
80
|
-
|
77
|
+
if TABLE_ITEMS.include?(name) && !node.ancestors.any? { |n| n.name == TABLE }
|
78
|
+
node.replace(node.children)
|
81
79
|
end
|
82
80
|
}
|
83
81
|
]
|
@@ -36,4 +36,12 @@ class HTML::Pipeline::CamoFilterTest < Test::Unit::TestCase
|
|
36
36
|
CamoFilter.call(orig, @options).to_s
|
37
37
|
end
|
38
38
|
end
|
39
|
+
|
40
|
+
def test_required_context_validation
|
41
|
+
exception = assert_raise(ArgumentError) {
|
42
|
+
CamoFilter.call("", {})
|
43
|
+
}
|
44
|
+
assert_match /:asset_proxy[^_]/, exception.message
|
45
|
+
assert_match /:asset_proxy_secret_key/, exception.message
|
46
|
+
end
|
39
47
|
end
|
@@ -1,16 +1,18 @@
|
|
1
1
|
require 'test_helper'
|
2
2
|
|
3
3
|
class HTML::Pipeline::EmojiFilterTest < Test::Unit::TestCase
|
4
|
+
EmojiFilter = HTML::Pipeline::EmojiFilter
|
5
|
+
|
4
6
|
def test_emojify
|
5
|
-
filter =
|
7
|
+
filter = EmojiFilter.new("<p>:shipit:</p>", {:asset_root => 'https://foo.com'})
|
6
8
|
doc = filter.call
|
7
9
|
assert_match "https://foo.com/emoji/shipit.png", doc.search('img').attr('src').value
|
8
10
|
end
|
9
|
-
|
10
|
-
def
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
11
|
+
|
12
|
+
def test_required_context_validation
|
13
|
+
exception = assert_raise(ArgumentError) {
|
14
|
+
EmojiFilter.call("", {})
|
15
|
+
}
|
16
|
+
assert_match /:asset_root/, exception.message
|
15
17
|
end
|
16
18
|
end
|
@@ -32,7 +32,7 @@ class HTML::Pipeline::SanitizationFilterTest < Test::Unit::TestCase
|
|
32
32
|
def test_sanitizes_li_elements_not_contained_in_ul_or_ol
|
33
33
|
stuff = "a\n<li>b</li>\nc"
|
34
34
|
html = SanitizationFilter.call(stuff).to_s
|
35
|
-
assert_equal "a\
|
35
|
+
assert_equal "a\nb\nc", html
|
36
36
|
end
|
37
37
|
|
38
38
|
def test_does_not_sanitize_li_elements_contained_in_ul_or_ol
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html-pipeline
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.5
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: bin
|
12
12
|
cert_chain: []
|
13
|
-
date: 2012-
|
13
|
+
date: 2012-12-10 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: gemoji
|
@@ -19,7 +19,7 @@ dependencies:
|
|
19
19
|
requirements:
|
20
20
|
- - ~>
|
21
21
|
- !ruby/object:Gem::Version
|
22
|
-
version: 1.
|
22
|
+
version: '1.0'
|
23
23
|
type: :runtime
|
24
24
|
prerelease: false
|
25
25
|
version_requirements: !ruby/object:Gem::Requirement
|
@@ -27,7 +27,7 @@ dependencies:
|
|
27
27
|
requirements:
|
28
28
|
- - ~>
|
29
29
|
- !ruby/object:Gem::Version
|
30
|
-
version: 1.
|
30
|
+
version: '1.0'
|
31
31
|
- !ruby/object:Gem::Dependency
|
32
32
|
name: nokogiri
|
33
33
|
requirement: !ruby/object:Gem::Requirement
|
@@ -150,6 +150,7 @@ extra_rdoc_files: []
|
|
150
150
|
files:
|
151
151
|
- .gitignore
|
152
152
|
- .travis.yml
|
153
|
+
- CHANGELOG.md
|
153
154
|
- Gemfile
|
154
155
|
- LICENSE
|
155
156
|
- README.md
|
@@ -184,7 +185,8 @@ files:
|
|
184
185
|
- test/html/pipeline/toc_filter_test.rb
|
185
186
|
- test/test_helper.rb
|
186
187
|
homepage: https://github.com/jch/html-pipeline
|
187
|
-
licenses:
|
188
|
+
licenses:
|
189
|
+
- MIT
|
188
190
|
post_install_message:
|
189
191
|
rdoc_options: []
|
190
192
|
require_paths:
|