html-pipeline 0.0.4 → 0.0.5

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,12 @@
1
+ # CHANGELOG
2
+
3
+ ## 0.0.5 (unreleased)
4
+
5
+ * fix li xss vulnerability in sanitization filter: vmg #31
6
+ * gemspec cleanup: nbibler #23, jbarnette #24
7
+ * doc updates: jch #16, pborreli #17, wickedshimmy #18, benubois #19, blackerby #21
8
+ * loosen gemoji dependency: josh #15
9
+
10
+ ## 0.0.4
11
+
12
+ * initial public release
data/README.md CHANGED
@@ -53,7 +53,9 @@ pipeline = HTML::Pipeline.new [
53
53
  result = pipeline.call <<CODE
54
54
  This is *great*:
55
55
 
56
- some_code(:first)
56
+ ``` ruby
57
+ some_code(:first)
58
+ ```
57
59
 
58
60
  CODE
59
61
  result[:output].to_s
@@ -83,19 +85,108 @@ filter.call
83
85
  ## Filters
84
86
 
85
87
  * `MentionFilter` - replace `@user` mentions with links
86
- * `AutoLinkFilter` - auto_linking urls in HTML
87
- * `CamoFilter` - replace http image urls with [camo-fied](https://github.com/github/camo) https versions
88
+ * `AutolinkFilter` - auto_linking urls in HTML
89
+ * `CamoFilter` - replace http image urls with [camo-fied](https://github.com/atmos/camo) https versions
88
90
  * `EmailReplyFilter` - util filter for working with emails
89
91
  * `EmojiFilter` - everyone loves [emoji](http://www.emoji-cheat-sheet.com/)!
92
+ * `HttpsFilter` - HTML Filter for replacing http github urls with https versions.
90
93
  * `ImageMaxWidthFilter` - link to full size image for large images
91
94
  * `MarkdownFilter` - convert markdown to html
92
95
  * `PlainTextInputFilter` - html escape text and wrap the result in a div
93
- * `SanitizationFilter` - whitelist santize user markup
96
+ * `SanitizationFilter` - whitelist sanitize user markup
94
97
  * `SyntaxHighlightFilter` - code syntax highlighter with [linguist](https://github.com/github/linguist)
95
98
  * `TextileFilter` - convert textile to html
96
99
  * `TableOfContentsFilter` - anchor headings with name attributes
97
100
 
98
- ## Development Setup
101
+ ## Examples
102
+
103
+ We define different pipelines for different parts of our app. Here are a few
104
+ paraphrased snippets to get you started:
105
+
106
+ ```ruby
107
+ # The context hash is how you pass options between different filters.
108
+ # See individual filter source for explanation of options.
109
+ context = {
110
+ :asset_root => "http://your-domain.com/where/your/images/live/icons",
111
+ :base_url => "http://your-domain.com"
112
+ }
113
+
114
+ # Pipeline providing sanitization and image hijacking but no mention
115
+ # related features.
116
+ SimplePipeline = Pipeline.new [
117
+ SanitizationFilter,
118
+ TableOfContentsFilter, # add 'name' anchors to all headers
119
+ CamoFilter,
120
+ ImageMaxWidthFilter,
121
+ SyntaxHighlightFilter,
122
+ EmojiFilter,
123
+ AutolinkFilter
124
+ ], context, {}
125
+
126
+ # Pipeline used for user provided content on the web
127
+ MarkdownPipeline = Pipeline.new [
128
+ MarkdownFilter,
129
+ SanitizationFilter,
130
+ CamoFilter,
131
+ ImageMaxWidthFilter,
132
+ HttpsFilter,
133
+ MentionFilter,
134
+ EmojiFilter,
135
+ SyntaxHighlightFilter
136
+ ], context.merge(:gfm => true), {} # enable github formatted markdown
137
+
138
+
139
+ # Define a pipeline based on another pipeline's filters
140
+ NonGFMMarkdownPipeline = Pipeline.new(MarkdownPipeline.filters,
141
+ context.merge(:gfm => false), {})
142
+
143
+ # Pipelines aren't limited to the web. You can use them for email
144
+ # processing also.
145
+ HtmlEmailPipeline = Pipeline.new [
146
+ ImageMaxWidthFilter
147
+ ], {}, {}
148
+
149
+ # Just emoji.
150
+ EmojiPipeline = Pipeline.new [
151
+ HTMLInputFilter,
152
+ EmojiFilter
153
+ ], context, {}
154
+ ```
155
+
156
+ ## Extending
157
+ To write a custom filter, you need a class with a `call` method that inherits
158
+ from `HTML::Pipeline::Filter`.
159
+
160
+ For example this filter adds a base url to images that are root relative:
161
+
162
+ ```ruby
163
+ require 'uri'
164
+
165
+ class RootRelativeFilter < HTML::Pipeline::Filter
166
+
167
+ def call
168
+ doc.search("img").each do |img|
169
+ next if img['src'].nil?
170
+ src = img['src'].strip
171
+ if src.start_with? '/'
172
+ img["src"] = URI.join(context[:base_url], src).to_s
173
+ end
174
+ end
175
+ doc
176
+ end
177
+
178
+ end
179
+ ```
180
+
181
+ Now this filter can be used in a pipeline:
182
+
183
+ ```ruby
184
+ Pipeline.new [ RootRelativeFilter ], { :base_url => 'http://somehost.com' }
185
+ ```
186
+
187
+ ## Development
188
+
189
+ To see what has changed in recent versions, see the [CHANGELOG](https://github.com/jch/html-pipeline/blob/master/CHANGELOG.md).
99
190
 
100
191
  ```sh
101
192
  bundle
@@ -104,11 +195,11 @@ rake test
104
195
 
105
196
  ## Contributing
106
197
 
107
- 1. Fork it
198
+ 1. [Fork it](https://help.github.com/articles/fork-a-repo)
108
199
  2. Create your feature branch (`git checkout -b my-new-feature`)
109
200
  3. Commit your changes (`git commit -am 'Added some feature'`)
110
201
  4. Push to the branch (`git push origin my-new-feature`)
111
- 5. Create new Pull Request
202
+ 5. Create new [Pull Request](https://help.github.com/articles/using-pull-requests)
112
203
 
113
204
 
114
205
  ## TODO
@@ -126,3 +217,5 @@ rake test
126
217
  * [Simon Rozet](mailto:simon@rozet.name)
127
218
  * [Vicent Martí](mailto:tanoku@gmail.com)
128
219
  * [Risk :danger: Olson](mailto:technoweenie@gmail.com)
220
+
221
+ Project is a member of the [OSS Manifesto](http://ossmanifesto.org/).
@@ -1,9 +1,10 @@
1
1
  # -*- encoding: utf-8 -*-
2
- require File.expand_path('../lib/html/pipeline/version', __FILE__)
2
+ require File.expand_path("../lib/html/pipeline/version", __FILE__)
3
3
 
4
4
  Gem::Specification.new do |gem|
5
5
  gem.name = "html-pipeline"
6
6
  gem.version = HTML::Pipeline::VERSION
7
+ gem.license = "MIT"
7
8
  gem.authors = ["Ryan Tomayko", "Jerry Cheung"]
8
9
  gem.email = ["ryan@github.com", "jerry@github.com"]
9
10
  gem.description = %q{GitHub HTML processing filters and utilities}
@@ -14,12 +15,12 @@ Gem::Specification.new do |gem|
14
15
  gem.test_files = gem.files.grep(%r{^test})
15
16
  gem.require_paths = ["lib"]
16
17
 
17
- gem.add_dependency 'gemoji', '~> 1.1.1'
18
- gem.add_dependency 'nokogiri', '~> 1.4'
19
- gem.add_dependency 'github-markdown', '~> 0.5'
20
- gem.add_dependency 'sanitize', '~> 2.0'
21
- gem.add_dependency 'github-linguist', '~> 2.1'
22
- gem.add_dependency 'rinku', '~> 1.7'
23
- gem.add_dependency 'escape_utils', '~> 0.2'
24
- gem.add_dependency 'activesupport', '>= 2'
18
+ gem.add_dependency "gemoji", "~> 1.0"
19
+ gem.add_dependency "nokogiri", "~> 1.4"
20
+ gem.add_dependency "github-markdown", "~> 0.5"
21
+ gem.add_dependency "sanitize", "~> 2.0"
22
+ gem.add_dependency "github-linguist", "~> 2.1"
23
+ gem.add_dependency "rinku", "~> 1.7"
24
+ gem.add_dependency "escape_utils", "~> 0.2"
25
+ gem.add_dependency "activesupport", ">= 2"
25
26
  end
@@ -9,8 +9,8 @@ module HTML
9
9
  #
10
10
  # See HTML::Pipeline::Filter for information on building filters.
11
11
  #
12
- # Contruct a Pipeline for running multiple HTML filters. A pipeline is created once
13
- # with one to many filters, and is then can be `call`ed many times over the course
12
+ # Construct a Pipeline for running multiple HTML filters. A pipeline is created once
13
+ # with one to many filters, and it then can be `call`ed many times over the course
14
14
  # of its lifetime with input.
15
15
  #
16
16
  # filters - Array of Filter objects. Each must respond to call(doc,
@@ -22,7 +22,7 @@ module HTML
22
22
  # nil. Default: empty Hash.
23
23
  # result_class - The default Class of the result object for individual
24
24
  # calls. Default: Hash. Protip: Pass in a Struct to get
25
- # some semblence of type safety.
25
+ # some semblance of type safety.
26
26
  class Pipeline
27
27
  autoload :VERSION, 'html/pipeline/version'
28
28
  autoload :Pipeline, 'html/pipeline/pipeline'
@@ -11,8 +11,8 @@ module HTML
11
11
  # in browser clients.
12
12
  #
13
13
  # Context options:
14
- # :asset_proxy - Base URL for constructed asset proxy URLs.
15
- # :asset_proxy_secret_key - The shared secret used to encode URLs.
14
+ # :asset_proxy (required) - Base URL for constructed asset proxy URLs.
15
+ # :asset_proxy_secret_key (required) - The shared secret used to encode URLs.
16
16
  #
17
17
  # This filter does not write additional information to the context.
18
18
  class CamoFilter < Filter
@@ -33,6 +33,12 @@ module HTML
33
33
  end
34
34
  doc
35
35
  end
36
+
37
+ # Implementation of validate hook.
38
+ # Errors should raise exceptions or use an existing validator.
39
+ def validate
40
+ needs :asset_proxy, :asset_proxy_secret_key
41
+ end
36
42
 
37
43
  # The camouflaged URL for a given image URL.
38
44
  def asset_proxy_url(url)
@@ -47,11 +53,11 @@ module HTML
47
53
 
48
54
  # Private: the hostname to use for generated asset proxied URLs.
49
55
  def asset_proxy_host
50
- context[:asset_proxy] or raise "Missing context :asset_proxy"
56
+ context[:asset_proxy]
51
57
  end
52
58
 
53
59
  def asset_proxy_secret_key
54
- context[:asset_proxy_secret_key] or raise "Missing context :asset_proxy_secret_key"
60
+ context[:asset_proxy_secret_key]
55
61
  end
56
62
 
57
63
  # Private: helper to hexencode a string. Each byte ends up encoded into
@@ -5,7 +5,7 @@ module HTML
5
5
  # HTML filter that replaces :emoji: with images.
6
6
  #
7
7
  # Context:
8
- # :asset_root - base url to link to emoji sprite
8
+ # :asset_root (required) - base url to link to emoji sprite
9
9
  class EmojiFilter < Filter
10
10
  # Build a regexp that matches all valid :emoji: names.
11
11
  EmojiPattern = /:(#{Emoji.names.map { |name| Regexp.escape(name) }.join('|')}):/
@@ -21,6 +21,12 @@ module HTML
21
21
  end
22
22
  doc
23
23
  end
24
+
25
+ # Implementation of validate hook.
26
+ # Errors should raise exceptions or use an existing validator.
27
+ def validate
28
+ needs :asset_root
29
+ end
24
30
 
25
31
  # Replace :emoji: with corresponding images.
26
32
  #
@@ -41,7 +47,7 @@ module HTML
41
47
  # Raises ArgumentError if context option has not been provided.
42
48
  # Returns the context's asset_root.
43
49
  def asset_root
44
- context[:asset_root] or raise ArgumentError, "Missing context :asset_root"
50
+ context[:asset_root]
45
51
  end
46
52
  end
47
53
  end
@@ -39,8 +39,9 @@ module HTML
39
39
  end
40
40
  @context = context || {}
41
41
  @result = result || {}
42
+ validate
42
43
  end
43
-
44
+
44
45
  # Public: Returns a simple Hash used to pass extra information into filters
45
46
  # and also to allow filters to make extracted information available to the
46
47
  # caller.
@@ -73,6 +74,10 @@ module HTML
73
74
  def call
74
75
  raise NotImplementedError
75
76
  end
77
+
78
+ # Make sure the context has everything we need. Noop: Subclasses can override.
79
+ def validate
80
+ end
76
81
 
77
82
  # The Repository object provided in the context hash, or nil when no
78
83
  # :repository was specified.
@@ -153,6 +158,21 @@ module HTML
153
158
  output.to_s
154
159
  end
155
160
  end
161
+
162
+ # Validator for required context. This will check that anything passed in
163
+ # contexts exists in @contexts
164
+ #
165
+ # If any errors are found an ArgumentError will be raised with a
166
+ # message listing all the missing contexts and the filters that
167
+ # require them.
168
+ def needs(*keys)
169
+ missing = keys.reject { |key| context.include? key }
170
+
171
+ if missing.any?
172
+ raise ArgumentError,
173
+ "Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join ', '}"
174
+ end
175
+ end
156
176
  end
157
177
  end
158
178
  end
@@ -9,11 +9,11 @@ module HTML
9
9
  class ImageMaxWidthFilter < Filter
10
10
  def call
11
11
  doc.search('img').each do |element|
12
- # Skip if theres already a style attribute. Not sure how this
12
+ # Skip if there's already a style attribute. Not sure how this
13
13
  # would happen but we can reconsider it in the future.
14
14
  next if element['style']
15
15
 
16
- # Bail out if src doesn't look like a valid http url. tryna avoid weird
16
+ # Bail out if src doesn't look like a valid http url. trying to avoid weird
17
17
  # js injection via javascript: urls.
18
18
  next if element['src'].to_s.strip =~ /\Ajavascript/i
19
19
 
@@ -33,7 +33,7 @@ module HTML
33
33
  :elements => %w(
34
34
  h1 h2 h3 h4 h5 h6 h7 h8 br b i strong em a pre code img tt
35
35
  div ins del sup sub p ol ul table blockquote dl dt dd
36
- kbd q samp var hr ruby rt rp
36
+ kbd q samp var hr ruby rt rp li tr td th
37
37
  ),
38
38
  :attributes => {
39
39
  'a' => ['href'],
@@ -62,22 +62,20 @@ module HTML
62
62
  'img' => {'src' => ['http', 'https', :relative]}
63
63
  },
64
64
  :transformers => [
65
- # whitelist only <li> elements that are descended from a <ul> or <ol>.
66
- # top-level <li> elements are removed because they can break out of
65
+ # Top-level <li> elements are removed because they can break out of
67
66
  # containing markup.
68
67
  lambda { |env|
69
68
  name, node = env[:node_name], env[:node]
70
- if name == LIST_ITEM && node.ancestors.any?{ |n| LISTS.include?(n.name) }
71
- {:node_whitelist => [node]}
69
+ if name == LIST_ITEM && !node.ancestors.any?{ |n| LISTS.include?(n.name) }
70
+ node.replace(node.children)
72
71
  end
73
72
  },
74
73
 
75
- # Whitelist only table child elements that are descended from a <table>.
76
74
  # Table child elements that are not contained by a <table> are removed.
77
75
  lambda { |env|
78
76
  name, node = env[:node_name], env[:node]
79
- if TABLE_ITEMS.include?(name) && node.ancestors.any? { |n| n.name == TABLE }
80
- { :node_whitelist => [node] }
77
+ if TABLE_ITEMS.include?(name) && !node.ancestors.any? { |n| n.name == TABLE }
78
+ node.replace(node.children)
81
79
  end
82
80
  }
83
81
  ]
@@ -1,5 +1,5 @@
1
1
  module HTML
2
2
  class Pipeline
3
- VERSION = "0.0.4"
3
+ VERSION = "0.0.5"
4
4
  end
5
5
  end
@@ -36,4 +36,12 @@ class HTML::Pipeline::CamoFilterTest < Test::Unit::TestCase
36
36
  CamoFilter.call(orig, @options).to_s
37
37
  end
38
38
  end
39
+
40
+ def test_required_context_validation
41
+ exception = assert_raise(ArgumentError) {
42
+ CamoFilter.call("", {})
43
+ }
44
+ assert_match /:asset_proxy[^_]/, exception.message
45
+ assert_match /:asset_proxy_secret_key/, exception.message
46
+ end
39
47
  end
@@ -1,16 +1,18 @@
1
1
  require 'test_helper'
2
2
 
3
3
  class HTML::Pipeline::EmojiFilterTest < Test::Unit::TestCase
4
+ EmojiFilter = HTML::Pipeline::EmojiFilter
5
+
4
6
  def test_emojify
5
- filter = HTML::Pipeline::EmojiFilter.new("<p>:shipit:</p>", {:asset_root => 'https://foo.com'})
7
+ filter = EmojiFilter.new("<p>:shipit:</p>", {:asset_root => 'https://foo.com'})
6
8
  doc = filter.call
7
9
  assert_match "https://foo.com/emoji/shipit.png", doc.search('img').attr('src').value
8
10
  end
9
-
10
- def test_missing_context
11
- filter = HTML::Pipeline::EmojiFilter.new("<p>:shipit:</p>", {})
12
- assert_raises ArgumentError do
13
- filter.call
14
- end
11
+
12
+ def test_required_context_validation
13
+ exception = assert_raise(ArgumentError) {
14
+ EmojiFilter.call("", {})
15
+ }
16
+ assert_match /:asset_root/, exception.message
15
17
  end
16
18
  end
@@ -32,7 +32,7 @@ class HTML::Pipeline::SanitizationFilterTest < Test::Unit::TestCase
32
32
  def test_sanitizes_li_elements_not_contained_in_ul_or_ol
33
33
  stuff = "a\n<li>b</li>\nc"
34
34
  html = SanitizationFilter.call(stuff).to_s
35
- assert_equal "a\n b \nc", html
35
+ assert_equal "a\nb\nc", html
36
36
  end
37
37
 
38
38
  def test_does_not_sanitize_li_elements_contained_in_ul_or_ol
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-pipeline
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2012-11-07 00:00:00.000000000 Z
13
+ date: 2012-12-10 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: gemoji
@@ -19,7 +19,7 @@ dependencies:
19
19
  requirements:
20
20
  - - ~>
21
21
  - !ruby/object:Gem::Version
22
- version: 1.1.1
22
+ version: '1.0'
23
23
  type: :runtime
24
24
  prerelease: false
25
25
  version_requirements: !ruby/object:Gem::Requirement
@@ -27,7 +27,7 @@ dependencies:
27
27
  requirements:
28
28
  - - ~>
29
29
  - !ruby/object:Gem::Version
30
- version: 1.1.1
30
+ version: '1.0'
31
31
  - !ruby/object:Gem::Dependency
32
32
  name: nokogiri
33
33
  requirement: !ruby/object:Gem::Requirement
@@ -150,6 +150,7 @@ extra_rdoc_files: []
150
150
  files:
151
151
  - .gitignore
152
152
  - .travis.yml
153
+ - CHANGELOG.md
153
154
  - Gemfile
154
155
  - LICENSE
155
156
  - README.md
@@ -184,7 +185,8 @@ files:
184
185
  - test/html/pipeline/toc_filter_test.rb
185
186
  - test/test_helper.rb
186
187
  homepage: https://github.com/jch/html-pipeline
187
- licenses: []
188
+ licenses:
189
+ - MIT
188
190
  post_install_message:
189
191
  rdoc_options: []
190
192
  require_paths: