trenni-sanitize 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 6fcc82f737fa654994e5e196347156bb38bfa74c
4
+ data.tar.gz: 7b8bafec2a1731ecc20c9f850e9620e9941a6f9d
5
+ SHA512:
6
+ metadata.gz: 106aab6ca77fed6efc2f7f77ab5f83d5606f20fafd0b80dd05d1aa2421b5626129d02a2513b8636eb8745d3b49bf2b6c4657a60156be90193bc20faebeaa5997
7
+ data.tar.gz: 48735a28e214041eba2e7f1087b2c553efe7d3816ece68ae9b00dfbc970abbd06fcd34e5ee4478d8a9133f0c851b40f7f66aaa3734935daaf91237e3aaf2029e
data/.gitignore ADDED
@@ -0,0 +1,19 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
18
+
19
+ lib/trenni/trenni.bundle
data/.rspec ADDED
@@ -0,0 +1,5 @@
1
+ --color
2
+ --format documentation
3
+ --backtrace
4
+ --warnings
5
+ --require spec_helper
data/.simplecov ADDED
@@ -0,0 +1,9 @@
1
+
2
+ SimpleCov.start do
3
+ add_filter "/spec/"
4
+ end
5
+
6
+ if ENV['TRAVIS']
7
+ require 'coveralls'
8
+ Coveralls.wear!
9
+ end
data/.travis.yml ADDED
@@ -0,0 +1,17 @@
1
+ language: ruby
2
+ sudo: false
3
+ rvm:
4
+ - 2.1
5
+ - 2.2
6
+ - 2.3
7
+ - 2.4
8
+ - ruby-head
9
+ - jruby-head
10
+ - rbx-2
11
+ env:
12
+ - COVERAGE=true
13
+ matrix:
14
+ allow_failures:
15
+ - rvm: "rbx-2"
16
+ - rvm: "ruby-head"
17
+ - rvm: "jruby-head"
data/Gemfile ADDED
@@ -0,0 +1,16 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in trenni.gemspec
4
+ gemspec
5
+
6
+ group :development do
7
+ gem 'pry'
8
+ end
9
+
10
+ group :test do
11
+ gem 'ruby-prof', platforms: [:mri]
12
+ gem "benchmark-ips"
13
+
14
+ # For comparisons:
15
+ gem "sanitize"
16
+ end
data/README.md ADDED
@@ -0,0 +1,139 @@
1
+ # trenni-sanitize::Sanitize
2
+
3
+ Sanitize markup by adding, changing or removing tags.
4
+
5
+ [![Build Status](https://secure.travis-ci.org/ioquatix/trenni-sanitize.svg)](http://travis-ci.org/ioquatix/trenni-sanitize)
6
+ [![Code Climate](https://codeclimate.com/github/ioquatix/trenni-sanitize.svg)](https://codeclimate.com/github/ioquatix/trenni-sanitize)
7
+ [![Coverage Status](https://coveralls.io/repos/ioquatix/trenni-sanitize/badge.svg)](https://coveralls.io/r/ioquatix/trenni-sanitize)
8
+
9
+ ## Motivation
10
+
11
+ I use the [sanitize] gem and generally it's great. However, it's performance can be an issue and additionally, it doesn't preserve tag namespaces when parsing fragments due to how Nokogiri works internally.
12
+
13
+ [sanitize]: https://github.com/rgrove/sanitize/
14
+
15
+ ## Is it fast?
16
+
17
+ In my informal testing, this gem is about ~50x faster than the [sanitize] gem when generating plain text.
18
+
19
+ Warming up --------------------------------------
20
+ Sanitize 96.000 i/100ms
21
+ Trenni::Sanitize 4.447k i/100ms
22
+ Calculating -------------------------------------
23
+ Sanitize 958.020 (± 4.5%) i/s - 4.800k in 5.020564s
24
+ Trenni::Sanitize 44.718k (± 4.2%) i/s - 226.797k in 5.080756s
25
+
26
+ Comparison:
27
+ Trenni::Sanitize: 44718.1 i/s
28
+ Sanitize: 958.0 i/s - 46.68x slower
29
+
30
+ ## Installation
31
+
32
+ Add this line to your application's Gemfile:
33
+
34
+ gem 'trenni-sanitize'
35
+
36
+ And then execute:
37
+
38
+ $ bundle
39
+
40
+ Or install it yourself as:
41
+
42
+ $ gem install trenni-sanitize
43
+
44
+ ## Usage
45
+
46
+ `Trenni::Sanitize::Delegate` is a stream-based processor. That means it parses the incoming markup and makes decisions about what to keep and what to discard during parsing.
47
+
48
+ ### Extracting Text
49
+
50
+ You can extract text using something similar to the following parser delegate:
51
+
52
+ ```ruby
53
+ class Text < Trenni::Sanitize::Filter
54
+ def filter(tag)
55
+ # Filter out all tags
56
+ return false
57
+ end
58
+
59
+ def doctype(string)
60
+ end
61
+
62
+ def instruction(string)
63
+ end
64
+
65
+ def text(string)
66
+ # Output all text
67
+ @output << string
68
+ end
69
+ end
70
+
71
+ text = Text.parse("<p>Hello World</p>").output
72
+ # => "Hello World"
73
+ ```
74
+
75
+ ### Extracting Safe Markup
76
+
77
+ Here is a simple filter that only allows a limited set of tags:
78
+
79
+ ```ruby
80
+ class Fragment < Trenni::Sanitize::Filter
81
+ STANDARD_ATTRIBUTES = ['class'].freeze
82
+
83
+ ALLOWED_TAGS = {
84
+ 'em' => [],
85
+ 'strong' => [],
86
+ 'p' => [],
87
+ 'img' => [] + ['src', 'alt', 'width', 'height'],
88
+ 'a' => ['href', 'target']
89
+ }.freeze
90
+
91
+ def filter(tag)
92
+ if attributes = ALLOWED_TAGS[tag.name]
93
+ tag.attributes.slice!(attributes)
94
+
95
+ return tag
96
+ end
97
+ end
98
+
99
+ def doctype(string)
100
+ end
101
+
102
+ def instruction(string)
103
+ end
104
+ end
105
+ ```
106
+
107
+ As you can see, while [sanitize] is driven by configuration, `Trenni::Sanitize::Filter` is driven by code.
108
+
109
+ ## Contributing
110
+
111
+ 1. Fork it
112
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
113
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
114
+ 4. Push to the branch (`git push origin my-new-feature`)
115
+ 5. Create new Pull Request
116
+
117
+ ## License
118
+
119
+ Released under the MIT license.
120
+
121
+ Copyright, 2018, by [Samuel G. D. Williams](http://www.codeotaku.com/samuel-williams).
122
+
123
+ Permission is hereby granted, free of charge, to any person obtaining a copy
124
+ of this software and associated documentation files (the "Software"), to deal
125
+ in the Software without restriction, including without limitation the rights
126
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
127
+ copies of the Software, and to permit persons to whom the Software is
128
+ furnished to do so, subject to the following conditions:
129
+
130
+ The above copyright notice and this permission notice shall be included in
131
+ all copies or substantial portions of the Software.
132
+
133
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
134
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
135
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
136
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
137
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
138
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
139
+ THE SOFTWARE.
data/Rakefile ADDED
@@ -0,0 +1,19 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ # Load all rake tasks:
5
+ import(*Dir.glob('tasks/**/*.rake'))
6
+
7
+ RSpec::Core::RakeTask.new(:test)
8
+
9
+ task :environment do
10
+ $LOAD_PATH.unshift File.expand_path('lib', __dir__)
11
+ end
12
+
13
+ task :console => :environment do
14
+ require 'pry'
15
+
16
+ Pry.start
17
+ end
18
+
19
+ task :default => :test
@@ -0,0 +1,25 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require_relative 'sanitize/extensions'
22
+
23
+ require_relative 'sanitize/text'
24
+ require_relative 'sanitize/fragment'
25
+
@@ -0,0 +1,14 @@
1
+
2
+ class Hash
3
+ unless defined?(slice)
4
+ def slice(*keys)
5
+ self.select{|key, value| keys.include? key}
6
+ end
7
+ end
8
+
9
+ unless defined?(slice!)
10
+ def slice!(*keys)
11
+ self.select!{|key, value| keys.include? key}
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,159 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require 'trenni/parsers'
22
+ require 'trenni/builder'
23
+ require 'trenni/entities'
24
+
25
+ module Trenni
26
+ module Sanitize
27
+ # Provides a high level interface for parsing markup.
28
+ class Filter
29
+ TAG = 1
30
+
31
+ DOCTYPE = 2
32
+ COMMENT = 4
33
+ INSTRUCTION = 8
34
+ CDATA = 16
35
+ TEXT = 32
36
+
37
+ CONTENT = DOCTYPE | COMMENT | INSTRUCTION | CDATA | TEXT
38
+ ALL = TAG | CONTENT
39
+
40
+ def self.parse(input, output = nil, entities = Trenni::Entities::HTML5)
41
+ # This allows us to handle passing in a string:
42
+ input = Trenni::Buffer(input)
43
+
44
+ output ||= MarkupString.new.force_encoding(input.encoding)
45
+
46
+ delegate = self.new(output, entities)
47
+
48
+ delegate.parse!(input)
49
+
50
+ return delegate
51
+ end
52
+
53
+ Node = Struct.new(:name, :tag, :skip) do
54
+ def skip!(mode = ALL)
55
+ self.skip |= mode
56
+ end
57
+
58
+ def skip?(mode = ALL)
59
+ (self.skip & mode) == mode
60
+ end
61
+
62
+ def [] key
63
+ self.tag.attributes[key]
64
+ end
65
+ end
66
+
67
+ def initialize(output, entities)
68
+ @output = output
69
+
70
+ @entities = entities
71
+
72
+ @current = nil
73
+ @stack = []
74
+
75
+ @current = @top = Node.new(nil, nil, 0)
76
+
77
+ @skip = nil
78
+ end
79
+
80
+ attr :output
81
+
82
+ # The current node being parsed.
83
+ attr :current
84
+
85
+ def top
86
+ @stack.last || @top
87
+ end
88
+
89
+ def parse!(input)
90
+ Trenni::Parsers.parse_markup(input, self, @entities)
91
+
92
+ while @stack.size > 1
93
+ close_tag(@stack.last.name)
94
+ end
95
+
96
+ return self
97
+ end
98
+
99
+ def open_tag_begin(name, offset)
100
+ tag = Tag.new(name, false, {})
101
+
102
+ @current = Node.new(name, tag, current.skip)
103
+ end
104
+
105
+ def attribute(key, value)
106
+ @current.tag.attributes[key] = value
107
+ end
108
+
109
+ def open_tag_end(self_closing)
110
+ if self_closing
111
+ @current.tag.closed = true
112
+ else
113
+ @stack << @current
114
+ end
115
+
116
+ filter(@current)
117
+
118
+ @current.tag.write_opening_tag(@output) unless @current.skip? TAG
119
+
120
+ # If the tag was self-closing, it's no longer current at this point, we are back in the context of the parent tag.
121
+ @current = self.top if self_closing
122
+ end
123
+
124
+ def close_tag(name, offset = nil)
125
+ while node = @stack.pop
126
+ node.tag.write_closing_tag(@output) unless node.skip? TAG
127
+
128
+ break if node.name == name
129
+ end
130
+
131
+ @current = self.top
132
+ end
133
+
134
+ def filter(tag)
135
+ return tag
136
+ end
137
+
138
+ def doctype(string)
139
+ @output << string unless current.skip? DOCTYPE
140
+ end
141
+
142
+ def comment(string)
143
+ @output << string unless current.skip? COMMENT
144
+ end
145
+
146
+ def instruction(string)
147
+ @output << string unless current.skip? INSTRUCTION
148
+ end
149
+
150
+ def cdata(string)
151
+ @output << string unless current.skip? CDATA
152
+ end
153
+
154
+ def text(string)
155
+ Markup.append(@output, string) unless current.skip? TEXT
156
+ end
157
+ end
158
+ end
159
+ end
@@ -0,0 +1,65 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require_relative 'filter'
22
+
23
+ module Trenni
24
+ module Sanitize
25
+ class Fragment < Filter
26
+ STANDARD_ATTRIBUTES = ['class', 'style'].freeze
27
+
28
+ ALLOWED_TAGS = {
29
+ 'div' => STANDARD_ATTRIBUTES,
30
+ 'span' => STANDARD_ATTRIBUTES,
31
+ 'br' => STANDARD_ATTRIBUTES,
32
+ 'b' => STANDARD_ATTRIBUTES,
33
+ 'i' => STANDARD_ATTRIBUTES,
34
+ 'em' => STANDARD_ATTRIBUTES,
35
+ 'strong' => STANDARD_ATTRIBUTES,
36
+ 'ul' => STANDARD_ATTRIBUTES,
37
+ 'strike' => STANDARD_ATTRIBUTES,
38
+ 'h1' => STANDARD_ATTRIBUTES,
39
+ 'h2' => STANDARD_ATTRIBUTES,
40
+ 'h3' => STANDARD_ATTRIBUTES,
41
+ 'h4' => STANDARD_ATTRIBUTES,
42
+ 'h5' => STANDARD_ATTRIBUTES,
43
+ 'h6' => STANDARD_ATTRIBUTES,
44
+ 'p' => STANDARD_ATTRIBUTES,
45
+ 'img' => STANDARD_ATTRIBUTES + ['src', 'alt', 'width', 'height'],
46
+ 'image' => STANDARD_ATTRIBUTES,
47
+ 'a' => STANDARD_ATTRIBUTES + ['href', 'target']
48
+ }.freeze
49
+
50
+ def filter(node)
51
+ if attributes = ALLOWED_TAGS[node.name]
52
+ node.tag.attributes.slice!(attributes)
53
+ else
54
+ node.skip!
55
+ end
56
+ end
57
+
58
+ def doctype(string)
59
+ end
60
+
61
+ def instruction(string)
62
+ end
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,47 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require_relative 'filter'
22
+
23
+ module Trenni
24
+ module Sanitize
25
+ class Text < Filter
26
+ def filter(node)
27
+ if node.name == 'script'
28
+ node.skip!(ALL) # Skip everything including content.
29
+ else
30
+ node.skip!(TAG) # Only skip the tag output, but not the content.
31
+ end
32
+ end
33
+
34
+ def doctype(string)
35
+ end
36
+
37
+ def comment(string)
38
+ end
39
+
40
+ def instruction(string)
41
+ end
42
+
43
+ def cdata(string)
44
+ end
45
+ end
46
+ end
47
+ end
@@ -0,0 +1,25 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ module Trenni
22
+ module Sanitize
23
+ VERSION = "0.1.0"
24
+ end
25
+ end
@@ -0,0 +1,53 @@
1
+
2
+ if ENV['COVERAGE']
3
+ begin
4
+ require 'simplecov'
5
+
6
+ SimpleCov.start do
7
+ add_filter "/spec/"
8
+ end
9
+
10
+ if ENV['TRAVIS']
11
+ require 'coveralls'
12
+ Coveralls.wear!
13
+ end
14
+ rescue LoadError
15
+ warn "Could not load simplecov: #{$!}"
16
+ end
17
+ end
18
+
19
+ require "bundler/setup"
20
+ require "trenni/sanitize"
21
+
22
+ begin
23
+ require 'ruby-prof'
24
+
25
+ RSpec.shared_context "profile" do
26
+ before(:all) do
27
+ RubyProf.start
28
+ end
29
+
30
+ after(:all) do
31
+ result = RubyProf.stop
32
+
33
+ # Print a flat profile to text
34
+ printer = RubyProf::FlatPrinter.new(result)
35
+ printer.print(STDOUT)
36
+ end
37
+ end
38
+ rescue LoadError
39
+ RSpec.shared_context "profile" do
40
+ before(:all) do
41
+ puts "Profiling not supported on this platform."
42
+ end
43
+ end
44
+ end
45
+
46
+ RSpec.configure do |config|
47
+ # Enable flags like --only-failures and --next-failure
48
+ config.example_status_persistence_file_path = ".rspec_status"
49
+
50
+ config.expect_with :rspec do |c|
51
+ c.syntax = :expect
52
+ end
53
+ end
@@ -0,0 +1,36 @@
1
+
2
+ require 'sanitize'
3
+ require 'benchmark/ips'
4
+
5
+ require 'trenni/sanitize/text'
6
+
7
+ RSpec.describe Trenni::Sanitize do
8
+ let(:buffer) {Trenni::Buffer.load_file(File.join(__dir__, "sample.html"))}
9
+
10
+ it "should be faster than alternatives" do
11
+ config = Sanitize::Config.freeze_config(
12
+ :elements => %w[b i em strong ul li strike h1 h2 h3 h4 h5 h6 p img image a],
13
+ :attributes => {
14
+ 'img' => %w[src alt width],
15
+ 'a' => %w[href]
16
+ },
17
+ )
18
+
19
+ text = buffer.read
20
+
21
+ puts Sanitize.fragment(text).inspect
22
+ puts Trenni::Sanitize::Text.parse(buffer).output.inspect
23
+
24
+ Benchmark.ips do |x|
25
+ x.report("Sanitize") do
26
+ Sanitize.fragment text
27
+ end
28
+
29
+ x.report("Trenni::Sanitize") do
30
+ Trenni::Sanitize::Text.parse(buffer)
31
+ end
32
+
33
+ x.compare!
34
+ end
35
+ end
36
+ end
@@ -0,0 +1,60 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require 'trenni/sanitize/fragment'
22
+
23
+ RSpec.describe Trenni::Sanitize::Fragment do
24
+ it "should filter out script tags" do
25
+ fragment = described_class.parse("<p onclick='malicious()'>Hello World</p><script>doot()</script>")
26
+
27
+ expect(fragment.output).to be == "<p>Hello World</p>"
28
+ end
29
+
30
+ it "should filter out nested script tags" do
31
+ fragment = described_class.parse("<div><p>Hello World</p><script>doot()</script></div>")
32
+
33
+ expect(fragment.output).to be == "<div><p>Hello World</p></div>"
34
+ end
35
+
36
+ it "should filter out tags" do
37
+ fragment = described_class.parse("<p onclick='malicious()'>Hello World</p><script>script</script>")
38
+
39
+ expect(fragment.output).to be == "<p>Hello World</p>"
40
+ end
41
+
42
+ it "should ignore unbalanced closing tags" do
43
+ fragment = described_class.parse("<p>Hello World</a></p>")
44
+
45
+ expect(fragment.output).to be == "<p>Hello World</p>"
46
+ end
47
+
48
+ it "should include trailing text" do
49
+ fragment = described_class.parse("Hello<script/>World")
50
+
51
+ expect(fragment.output).to be == "HelloWorld"
52
+ end
53
+
54
+ it "should escape text" do
55
+ fragment = described_class.parse("x&amp;y")
56
+
57
+ expect(fragment.output).to be == "x&amp;y"
58
+ end
59
+ end
60
+
@@ -0,0 +1,12 @@
1
+ <hr>
2
+ <a href="http://somegreatsite.com">Link Name</a>
3
+ is a link to another nifty site
4
+ <h1>This is a Header</h1>
5
+ <h1>This is a Medium Header</h2>
6
+ Send me mail at <a href="mailto:support@yourcompany.com">
7
+ support@yourcompany.com</a>.
8
+ <hr>
9
+ <p>This is a new paragraph!</p>
10
+ <p><b>This is a new paragraph!</b></p>
11
+ <br/><b><i>This is a new sentence without a paragraph break, in bold italics.</i></b>
12
+ <hr>
File without changes
@@ -0,0 +1,25 @@
1
+
2
+ require_relative 'lib/trenni/sanitize/version'
3
+
4
+ Gem::Specification.new do |spec|
5
+ spec.name = "trenni-sanitize"
6
+ spec.platform = Gem::Platform::RUBY
7
+ spec.version = Trenni::Sanitize::VERSION
8
+ spec.authors = ["Samuel Williams"]
9
+ spec.email = ["samuel.williams@oriontransfer.co.nz"]
10
+ spec.summary = %q{Sanitize markdown according to a set of rules.}
11
+ spec.homepage = "https://github.com/ioquatix/trenni-sanitize"
12
+
13
+ spec.files = `git ls-files`.split($/)
14
+ spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
15
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
16
+ spec.require_paths = ["lib"]
17
+
18
+ spec.required_ruby_version = '~> 2.1'
19
+
20
+ spec.add_dependency "trenni", '~> 3.5.0'
21
+
22
+ spec.add_development_dependency "bundler", "~> 1.3"
23
+ spec.add_development_dependency "rspec", "~> 3.4"
24
+ spec.add_development_dependency "rake"
25
+ end
metadata ADDED
@@ -0,0 +1,123 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: trenni-sanitize
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Samuel Williams
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2018-02-14 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: trenni
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 3.5.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 3.5.0
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.3'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.3'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.4'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.4'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rake
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ description:
70
+ email:
71
+ - samuel.williams@oriontransfer.co.nz
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - ".gitignore"
77
+ - ".rspec"
78
+ - ".simplecov"
79
+ - ".travis.yml"
80
+ - Gemfile
81
+ - README.md
82
+ - Rakefile
83
+ - lib/trenni/sanitize.rb
84
+ - lib/trenni/sanitize/extensions.rb
85
+ - lib/trenni/sanitize/filter.rb
86
+ - lib/trenni/sanitize/fragment.rb
87
+ - lib/trenni/sanitize/text.rb
88
+ - lib/trenni/sanitize/version.rb
89
+ - spec/spec_helper.rb
90
+ - spec/trenni/sanitize/benchmark_spec.rb
91
+ - spec/trenni/sanitize/fragment_spec.rb
92
+ - spec/trenni/sanitize/sample.html
93
+ - spec/trenni/sanitize/text_spec.rb
94
+ - trenni-sanitize.gemspec
95
+ homepage: https://github.com/ioquatix/trenni-sanitize
96
+ licenses: []
97
+ metadata: {}
98
+ post_install_message:
99
+ rdoc_options: []
100
+ require_paths:
101
+ - lib
102
+ required_ruby_version: !ruby/object:Gem::Requirement
103
+ requirements:
104
+ - - "~>"
105
+ - !ruby/object:Gem::Version
106
+ version: '2.1'
107
+ required_rubygems_version: !ruby/object:Gem::Requirement
108
+ requirements:
109
+ - - ">="
110
+ - !ruby/object:Gem::Version
111
+ version: '0'
112
+ requirements: []
113
+ rubyforge_project:
114
+ rubygems_version: 2.6.12
115
+ signing_key:
116
+ specification_version: 4
117
+ summary: Sanitize markdown according to a set of rules.
118
+ test_files:
119
+ - spec/spec_helper.rb
120
+ - spec/trenni/sanitize/benchmark_spec.rb
121
+ - spec/trenni/sanitize/fragment_spec.rb
122
+ - spec/trenni/sanitize/sample.html
123
+ - spec/trenni/sanitize/text_spec.rb