html_aware_truncation 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 81d7b744f16d55b31ecb503a6d98fe9b1575751a
4
+ data.tar.gz: 783c9c22e310d260e8e357fc1b08517b6cf983ba
5
+ SHA512:
6
+ metadata.gz: a78a2d9a01ccf0fd98a6cd449292231d16eebef9e4aab4c3bf956a40dd93a033a942ce372914f960031d9bcf27ce27998a56d8ab0fb9039d41a03048999e791f
7
+ data.tar.gz: 9abc87ca6410bb25d55c84d4d67b8b1dc201df3091807d5890f8eab70aa77026bd3f718e0f4a01a3adbe8e21e2694cf84f33cbcd55b0c4d36a90d7451d8fcaf9
@@ -0,0 +1,12 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ .byebug_history
11
+ # rspec failure tracking
12
+ .rspec_status
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.3.3
5
+ before_install: gem install bundler -v 1.14.5
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in html_aware_truncation.gemspec
4
+ gemspec
5
+
6
+ gem "byebug", group: [:development, :test]
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2017 Jonathan Rochkind
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,89 @@
1
+ # HtmlAwareTruncation
2
+ [![Gem Version](https://badge.fury.io/rb/html_aware_truncation.svg)](https://badge.fury.io/rb/html_aware_truncation)
3
+ [![Build Status](https://travis-ci.org/jrochkind/html_aware_truncation.svg?branch=master)](https://travis-ci.org/jrochkind/html_aware_truncation)
4
+
5
+
6
+ Yet another ruby html-aware truncation routine. Truncate HTML to max text characters,
7
+ resulting in still legal HTML without any unclosed tags etc.
8
+
9
+ I was unable to find an existing solution that met my needs:
10
+ * Uses [nokogiri](https://github.com/sparklemotion/nokogiri) (cause it's really good at handling somewhat invalid HTML input, and you probably already have it as a dependency)
11
+ * Does not monkey-patch nokogiri or String or anything else.
12
+ * Follows Rails [truncate helper](http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-truncate)
13
+ semantics, including a custom :separator that can be a string or regex, usually for word boundaries.
14
+
15
+
16
+ ## Usage
17
+
18
+ ```ruby
19
+ require 'html_aware_truncation'
20
+ string = "<p>Lots of html <b>with bolded stuff</b></p>"
21
+ HtmlAwareTruncation.truncate_html(string, length: 10)
22
+ # => "<p>Lots of h…</p>"
23
+ HtmlAwareTruncation.truncate_html(string, length: 10, separator: /\b/)
24
+ # => "<p>Lots of …</p>"
25
+ HtmlAwareTruncation.truncate_html(string, length: 10, separator: /\b/, omission: '--')
26
+ # => "<p>Lots of --</p>"
27
+ ```
28
+
29
+ If you already have a Nokogiri node, or want to do the Nokogiri
30
+ parsing and serialization yourself, you can pass a single Nokogiri node
31
+ to `truncate_nokogiri_node`. Often a `Nokogiri::HTML::DocumentFragment` makes sense:
32
+
33
+ ```ruby
34
+ node = Nokogiri::HTML::DocumentFragment.parse(some_html_str)
35
+ HtmlAwareTruncation.truncate_nokogiri_node(some_html_str, length: 10)
36
+ # => Returns a Nokogiri node, may mutate original passed in, not entirely sure.
37
+ ```
38
+
39
+ For convenience, you can `include` the `HtmlAwareTruncation` module, to
40
+ get it's methods as mixins.
41
+
42
+ ```ruby
43
+ require 'html_aware_truncation'
44
+ class Something
45
+ include HtmlAwareTruncation
46
+
47
+ def something
48
+ truncate_html(whatever)
49
+ end
50
+ end
51
+ ```
52
+
53
+ ## Known problems
54
+
55
+ This isn't perfect, but it's good enough for me to use in several production
56
+ apps. In edge cases, it may sometimes:
57
+
58
+ * May in some cases be an extra character (or a few) above the specified `length` limit (off by one error maybe?)
59
+ * put the omission mark in a node of it's own, which is kind of silly: `"<p>Stuff <b>…</b></p>"`
60
+ * leave one or more empty nodes at the end: `"<p>Stuff and...<b></b></p>"`
61
+ * Put the omission mark in a tag/node that really ought not to have text content: `"<ul><li>stuff</li>…</ul>"
62
+ (This one bothers me the most, it's the only case I know this gem produces slightly illegal HTML, but generally happens rarely)
63
+
64
+ Some specs marked `pending` demonstrate some "bad behavior", but there may be others un-tested.
65
+
66
+ In general though, this has not caused me real problems in production, it works out.
67
+ I still find this preferable to other alternative gems I know about, so I packaged it up in
68
+ case you do too. Patches welcome.
69
+
70
+ ## Contributing
71
+
72
+ Bug reports and pull requests are welcome on GitHub at https://github.com/jrochkind/html_aware_truncation.
73
+
74
+
75
+ ## License
76
+
77
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
78
+
79
+ ## Alternatives
80
+
81
+ I adapted some code or tests from some of these. I mostly adapted from
82
+ an example in [a blog post now only in the wayback machine](https://web-beta.archive.org/web/20160116165808/http://blog.madebydna.com/all/code/2010/06/04/ruby-helper-to-cleanly-truncate-html.html).
83
+ Alternative examples can also be useful to look at to see how/if they solve the known problems with this gem, for ideas.
84
+
85
+ * https://github.com/nono/HTML-Truncator
86
+ * https://github.com/hgmnz/truncate_html
87
+ * https://github.com/ianwhite/truncate_html
88
+
89
+
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "html_aware_truncation"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,28 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'html_aware_truncation/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "html_aware_truncation"
8
+ spec.version = HtmlAwareTruncation::VERSION
9
+ spec.authors = ["Jonathan Rochkind"]
10
+ spec.email = ["jrochkind@chemheritage.org"]
11
+
12
+ spec.summary = %q{Yet another ruby html-aware truncation routine}
13
+ #spec.homepage = "TODO: Put your gem's website or public repo URL here."
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
17
+ f.match(%r{^(test|spec|features)/})
18
+ end
19
+ # spec.bindir = "exe"
20
+ # spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
+ spec.require_paths = ["lib"]
22
+
23
+ spec.add_runtime_dependency "nokogiri", "~> 1.0"
24
+
25
+ spec.add_development_dependency "bundler", "~> 1.14"
26
+ spec.add_development_dependency "rake"
27
+ spec.add_development_dependency "rspec", "~> 3.0"
28
+ end
@@ -0,0 +1,72 @@
1
+ require "html_aware_truncation/version"
2
+ require 'nokogiri'
3
+
4
+ module HtmlAwareTruncation
5
+ define_singleton_method(:default_length) { @default_length }
6
+ define_singleton_method(:default_length=) { |val| @default_length = val }
7
+ self.default_length = 200
8
+ define_singleton_method(:default_omission) { @default_omission }
9
+ define_singleton_method(:default_omission=) { |val| @default_omission = val }
10
+ self.default_omission = '…'
11
+
12
+ def truncate_html(str,
13
+ length: HtmlAwareTruncation.default_length,
14
+ omission: HtmlAwareTruncation.default_omission,
15
+ separator: nil)
16
+
17
+ HtmlAwareTruncation.truncate_nokogiri_node(
18
+ Nokogiri::HTML::DocumentFragment.parse(str),
19
+ length: length,
20
+ omission: omission,
21
+ separator: separator
22
+ ).to_html
23
+ end
24
+ module_function :truncate_html
25
+
26
+ # HTML-aware truncation of a `Nokogiri::HTML::DocumentFragment`, perhaps
27
+ # one you created with `Nokogiri::HTML::DocumentFragment.parse(str)`
28
+ # Returns a TODO. (may mutate input?)
29
+ #
30
+ # See also truncate_string, which will take and return a string, parsing
31
+ # for you for convenience.
32
+ def truncate_nokogiri_node(node,
33
+ length: HtmlAwareTruncation.default_length,
34
+ omission: HtmlAwareTruncation.default_omission,
35
+ separator: nil)
36
+ if node.kind_of?(::Nokogiri::XML::Text)
37
+ if node.content.length > length
38
+ allowable_endpoint = [0, length - omission.length].max
39
+ if separator
40
+ allowable_endpoint = (node.content.rindex(separator, allowable_endpoint) || allowable_endpoint)
41
+ end
42
+
43
+ ::Nokogiri::XML::Text.new(node.content.slice(0, allowable_endpoint) + omission, node.parent)
44
+ else
45
+ node.dup
46
+ end
47
+ else # DocumentFragment or Element
48
+ return node if node.inner_text.length <= length
49
+
50
+ truncated_node = node.dup
51
+ truncated_node.children.remove
52
+ remaining_length = length
53
+
54
+ node.children.each do |child|
55
+ if remaining_length == 0
56
+ truncated_node.add_child ::Nokogiri::XML::Text.new(omission, truncated_node)
57
+ break
58
+ elsif remaining_length < 0
59
+ break
60
+ end
61
+ truncated_node.add_child HtmlAwareTruncation.truncate_nokogiri_node(child, length: remaining_length, omission: omission, separator: separator)
62
+ # can end up less than 0 if the child was truncated to fit, that's
63
+ # fine:
64
+ remaining_length = remaining_length - child.inner_text.length
65
+
66
+ end
67
+ truncated_node
68
+ end
69
+ end
70
+ module_function :truncate_nokogiri_node
71
+
72
+ end
@@ -0,0 +1,3 @@
1
+ module HtmlAwareTruncation
2
+ VERSION = "1.0.0"
3
+ end
metadata ADDED
@@ -0,0 +1,112 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: html_aware_truncation
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Jonathan Rochkind
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-03-28 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.14'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.14'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '3.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '3.0'
69
+ description:
70
+ email:
71
+ - jrochkind@chemheritage.org
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - ".gitignore"
77
+ - ".rspec"
78
+ - ".travis.yml"
79
+ - Gemfile
80
+ - LICENSE.txt
81
+ - README.md
82
+ - Rakefile
83
+ - bin/console
84
+ - bin/setup
85
+ - html_aware_truncation.gemspec
86
+ - lib/html_aware_truncation.rb
87
+ - lib/html_aware_truncation/version.rb
88
+ homepage:
89
+ licenses:
90
+ - MIT
91
+ metadata: {}
92
+ post_install_message:
93
+ rdoc_options: []
94
+ require_paths:
95
+ - lib
96
+ required_ruby_version: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - ">="
99
+ - !ruby/object:Gem::Version
100
+ version: '0'
101
+ required_rubygems_version: !ruby/object:Gem::Requirement
102
+ requirements:
103
+ - - ">="
104
+ - !ruby/object:Gem::Version
105
+ version: '0'
106
+ requirements: []
107
+ rubyforge_project:
108
+ rubygems_version: 2.5.2
109
+ signing_key:
110
+ specification_version: 4
111
+ summary: Yet another ruby html-aware truncation routine
112
+ test_files: []