html_aware_truncation 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +12 -0
- data/.rspec +2 -0
- data/.travis.yml +5 -0
- data/Gemfile +6 -0
- data/LICENSE.txt +21 -0
- data/README.md +89 -0
- data/Rakefile +6 -0
- data/bin/console +14 -0
- data/bin/setup +8 -0
- data/html_aware_truncation.gemspec +28 -0
- data/lib/html_aware_truncation.rb +72 -0
- data/lib/html_aware_truncation/version.rb +3 -0
- metadata +112 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 81d7b744f16d55b31ecb503a6d98fe9b1575751a
|
4
|
+
data.tar.gz: 783c9c22e310d260e8e357fc1b08517b6cf983ba
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: a78a2d9a01ccf0fd98a6cd449292231d16eebef9e4aab4c3bf956a40dd93a033a942ce372914f960031d9bcf27ce27998a56d8ab0fb9039d41a03048999e791f
|
7
|
+
data.tar.gz: 9abc87ca6410bb25d55c84d4d67b8b1dc201df3091807d5890f8eab70aa77026bd3f718e0f4a01a3adbe8e21e2694cf84f33cbcd55b0c4d36a90d7451d8fcaf9
|
data/.gitignore
ADDED
data/.rspec
ADDED
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2017 Jonathan Rochkind
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,89 @@
|
|
1
|
+
# HtmlAwareTruncation
|
2
|
+
[](https://badge.fury.io/rb/html_aware_truncation)
|
3
|
+
[](https://travis-ci.org/jrochkind/html_aware_truncation)
|
4
|
+
|
5
|
+
|
6
|
+
Yet another ruby html-aware truncation routine. Truncate HTML to max text characters,
|
7
|
+
resulting in still legal HTML without any unclosed tags etc.
|
8
|
+
|
9
|
+
I was unable to find an existing solution that met my needs:
|
10
|
+
* Uses [nokogiri](https://github.com/sparklemotion/nokogiri) (cause it's really good at handling somewhat invalid HTML input, and you probably already have it as a dependency)
|
11
|
+
* Does not monkey-patch nokogiri or String or anything else.
|
12
|
+
* Follows Rails [truncate helper](http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-truncate)
|
13
|
+
semantics, including a custom :separator that can be a string or regex, usually for word boundaries.
|
14
|
+
|
15
|
+
|
16
|
+
## Usage
|
17
|
+
|
18
|
+
```ruby
|
19
|
+
require 'html_aware_truncation'
|
20
|
+
string = "<p>Lots of html <b>with bolded stuff</b></p>"
|
21
|
+
HtmlAwareTruncation.truncate_html(string, length: 10)
|
22
|
+
# => "<p>Lots of h…</p>"
|
23
|
+
HtmlAwareTruncation.truncate_html(string, length: 10, separator: /\b/)
|
24
|
+
# => "<p>Lots of …</p>"
|
25
|
+
HtmlAwareTruncation.truncate_html(string, length: 10, separator: /\b/, omission: '--')
|
26
|
+
# => "<p>Lots of --</p>"
|
27
|
+
```
|
28
|
+
|
29
|
+
If you already have a Nokogiri node, or want to do the Nokogiri
|
30
|
+
parsing and serialization yourself, you can pass a single Nokogiri node
|
31
|
+
to `truncate_nokogiri_node`. Often a `Nokogiri::HTML::DocumentFragment` makes sense:
|
32
|
+
|
33
|
+
```ruby
|
34
|
+
node = Nokogiri::HTML::DocumentFragment.parse(some_html_str)
|
35
|
+
HtmlAwareTruncation.truncate_nokogiri_node(some_html_str, length: 10)
|
36
|
+
# => Returns a Nokogiri node, may mutate original passed in, not entirely sure.
|
37
|
+
```
|
38
|
+
|
39
|
+
For convenience, you can `include` the `HtmlAwareTruncation` module, to
|
40
|
+
get it's methods as mixins.
|
41
|
+
|
42
|
+
```ruby
|
43
|
+
require 'html_aware_truncation'
|
44
|
+
class Something
|
45
|
+
include HtmlAwareTruncation
|
46
|
+
|
47
|
+
def something
|
48
|
+
truncate_html(whatever)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
```
|
52
|
+
|
53
|
+
## Known problems
|
54
|
+
|
55
|
+
This isn't perfect, but it's good enough for me to use in several production
|
56
|
+
apps. In edge cases, it may sometimes:
|
57
|
+
|
58
|
+
* May in some cases be an extra character (or a few) above the specified `length` limit (off by one error maybe?)
|
59
|
+
* put the omission mark in a node of it's own, which is kind of silly: `"<p>Stuff <b>…</b></p>"`
|
60
|
+
* leave one or more empty nodes at the end: `"<p>Stuff and...<b></b></p>"`
|
61
|
+
* Put the omission mark in a tag/node that really ought not to have text content: `"<ul><li>stuff</li>…</ul>"
|
62
|
+
(This one bothers me the most, it's the only case I know this gem produces slightly illegal HTML, but generally happens rarely)
|
63
|
+
|
64
|
+
Some specs marked `pending` demonstrate some "bad behavior", but there may be others un-tested.
|
65
|
+
|
66
|
+
In general though, this has not caused me real problems in production, it works out.
|
67
|
+
I still find this preferable to other alternative gems I know about, so I packaged it up in
|
68
|
+
case you do too. Patches welcome.
|
69
|
+
|
70
|
+
## Contributing
|
71
|
+
|
72
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/jrochkind/html_aware_truncation.
|
73
|
+
|
74
|
+
|
75
|
+
## License
|
76
|
+
|
77
|
+
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
78
|
+
|
79
|
+
## Alternatives
|
80
|
+
|
81
|
+
I adapted some code or tests from some of these. I mostly adapted from
|
82
|
+
an example in [a blog post now only in the wayback machine](https://web-beta.archive.org/web/20160116165808/http://blog.madebydna.com/all/code/2010/06/04/ruby-helper-to-cleanly-truncate-html.html).
|
83
|
+
Alternative examples can also be useful to look at to see how/if they solve the known problems with this gem, for ideas.
|
84
|
+
|
85
|
+
* https://github.com/nono/HTML-Truncator
|
86
|
+
* https://github.com/hgmnz/truncate_html
|
87
|
+
* https://github.com/ianwhite/truncate_html
|
88
|
+
|
89
|
+
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "html_aware_truncation"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start(__FILE__)
|
data/bin/setup
ADDED
@@ -0,0 +1,28 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'html_aware_truncation/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "html_aware_truncation"
|
8
|
+
spec.version = HtmlAwareTruncation::VERSION
|
9
|
+
spec.authors = ["Jonathan Rochkind"]
|
10
|
+
spec.email = ["jrochkind@chemheritage.org"]
|
11
|
+
|
12
|
+
spec.summary = %q{Yet another ruby html-aware truncation routine}
|
13
|
+
#spec.homepage = "TODO: Put your gem's website or public repo URL here."
|
14
|
+
spec.license = "MIT"
|
15
|
+
|
16
|
+
spec.files = `git ls-files -z`.split("\x0").reject do |f|
|
17
|
+
f.match(%r{^(test|spec|features)/})
|
18
|
+
end
|
19
|
+
# spec.bindir = "exe"
|
20
|
+
# spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
21
|
+
spec.require_paths = ["lib"]
|
22
|
+
|
23
|
+
spec.add_runtime_dependency "nokogiri", "~> 1.0"
|
24
|
+
|
25
|
+
spec.add_development_dependency "bundler", "~> 1.14"
|
26
|
+
spec.add_development_dependency "rake"
|
27
|
+
spec.add_development_dependency "rspec", "~> 3.0"
|
28
|
+
end
|
@@ -0,0 +1,72 @@
|
|
1
|
+
require "html_aware_truncation/version"
|
2
|
+
require 'nokogiri'
|
3
|
+
|
4
|
+
module HtmlAwareTruncation
|
5
|
+
define_singleton_method(:default_length) { @default_length }
|
6
|
+
define_singleton_method(:default_length=) { |val| @default_length = val }
|
7
|
+
self.default_length = 200
|
8
|
+
define_singleton_method(:default_omission) { @default_omission }
|
9
|
+
define_singleton_method(:default_omission=) { |val| @default_omission = val }
|
10
|
+
self.default_omission = '…'
|
11
|
+
|
12
|
+
def truncate_html(str,
|
13
|
+
length: HtmlAwareTruncation.default_length,
|
14
|
+
omission: HtmlAwareTruncation.default_omission,
|
15
|
+
separator: nil)
|
16
|
+
|
17
|
+
HtmlAwareTruncation.truncate_nokogiri_node(
|
18
|
+
Nokogiri::HTML::DocumentFragment.parse(str),
|
19
|
+
length: length,
|
20
|
+
omission: omission,
|
21
|
+
separator: separator
|
22
|
+
).to_html
|
23
|
+
end
|
24
|
+
module_function :truncate_html
|
25
|
+
|
26
|
+
# HTML-aware truncation of a `Nokogiri::HTML::DocumentFragment`, perhaps
|
27
|
+
# one you created with `Nokogiri::HTML::DocumentFragment.parse(str)`
|
28
|
+
# Returns a TODO. (may mutate input?)
|
29
|
+
#
|
30
|
+
# See also truncate_string, which will take and return a string, parsing
|
31
|
+
# for you for convenience.
|
32
|
+
def truncate_nokogiri_node(node,
|
33
|
+
length: HtmlAwareTruncation.default_length,
|
34
|
+
omission: HtmlAwareTruncation.default_omission,
|
35
|
+
separator: nil)
|
36
|
+
if node.kind_of?(::Nokogiri::XML::Text)
|
37
|
+
if node.content.length > length
|
38
|
+
allowable_endpoint = [0, length - omission.length].max
|
39
|
+
if separator
|
40
|
+
allowable_endpoint = (node.content.rindex(separator, allowable_endpoint) || allowable_endpoint)
|
41
|
+
end
|
42
|
+
|
43
|
+
::Nokogiri::XML::Text.new(node.content.slice(0, allowable_endpoint) + omission, node.parent)
|
44
|
+
else
|
45
|
+
node.dup
|
46
|
+
end
|
47
|
+
else # DocumentFragment or Element
|
48
|
+
return node if node.inner_text.length <= length
|
49
|
+
|
50
|
+
truncated_node = node.dup
|
51
|
+
truncated_node.children.remove
|
52
|
+
remaining_length = length
|
53
|
+
|
54
|
+
node.children.each do |child|
|
55
|
+
if remaining_length == 0
|
56
|
+
truncated_node.add_child ::Nokogiri::XML::Text.new(omission, truncated_node)
|
57
|
+
break
|
58
|
+
elsif remaining_length < 0
|
59
|
+
break
|
60
|
+
end
|
61
|
+
truncated_node.add_child HtmlAwareTruncation.truncate_nokogiri_node(child, length: remaining_length, omission: omission, separator: separator)
|
62
|
+
# can end up less than 0 if the child was truncated to fit, that's
|
63
|
+
# fine:
|
64
|
+
remaining_length = remaining_length - child.inner_text.length
|
65
|
+
|
66
|
+
end
|
67
|
+
truncated_node
|
68
|
+
end
|
69
|
+
end
|
70
|
+
module_function :truncate_nokogiri_node
|
71
|
+
|
72
|
+
end
|
metadata
ADDED
@@ -0,0 +1,112 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: html_aware_truncation
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Jonathan Rochkind
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2017-03-28 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: nokogiri
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: bundler
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '1.14'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '1.14'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rake
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: rspec
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - "~>"
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '3.0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - "~>"
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '3.0'
|
69
|
+
description:
|
70
|
+
email:
|
71
|
+
- jrochkind@chemheritage.org
|
72
|
+
executables: []
|
73
|
+
extensions: []
|
74
|
+
extra_rdoc_files: []
|
75
|
+
files:
|
76
|
+
- ".gitignore"
|
77
|
+
- ".rspec"
|
78
|
+
- ".travis.yml"
|
79
|
+
- Gemfile
|
80
|
+
- LICENSE.txt
|
81
|
+
- README.md
|
82
|
+
- Rakefile
|
83
|
+
- bin/console
|
84
|
+
- bin/setup
|
85
|
+
- html_aware_truncation.gemspec
|
86
|
+
- lib/html_aware_truncation.rb
|
87
|
+
- lib/html_aware_truncation/version.rb
|
88
|
+
homepage:
|
89
|
+
licenses:
|
90
|
+
- MIT
|
91
|
+
metadata: {}
|
92
|
+
post_install_message:
|
93
|
+
rdoc_options: []
|
94
|
+
require_paths:
|
95
|
+
- lib
|
96
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
97
|
+
requirements:
|
98
|
+
- - ">="
|
99
|
+
- !ruby/object:Gem::Version
|
100
|
+
version: '0'
|
101
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
102
|
+
requirements:
|
103
|
+
- - ">="
|
104
|
+
- !ruby/object:Gem::Version
|
105
|
+
version: '0'
|
106
|
+
requirements: []
|
107
|
+
rubyforge_project:
|
108
|
+
rubygems_version: 2.5.2
|
109
|
+
signing_key:
|
110
|
+
specification_version: 4
|
111
|
+
summary: Yet another ruby html-aware truncation routine
|
112
|
+
test_files: []
|