algolia_html_extractor 2.1.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5a8828fb4ece535b803731889c5f8be758f06a7e
4
- data.tar.gz: 7b8b80ce73ddefeaa901e5483b2b4822ad7f35a3
3
+ metadata.gz: bbf8df27c69c4d6f2f16de4bd7cf18fcd703fb43
4
+ data.tar.gz: a01708af7fe1a3c42d364a099e443ac05f6f8a75
5
5
  SHA512:
6
- metadata.gz: 4e12dc7fc939f8d7551cc0f1637c394bd5794bc13521fc9c04dde418a238980956e74ebcede3e5317c39e0b652ea1e85c775dd8ac6be0dbd1f026005b915485b
7
- data.tar.gz: a894a352c8efc2a4214c1c0ae4088d143ae1b831e86edb57ea6da3012cfe21a3fb22f230ec5337b0d240feb75a309bd155301140103500c1d6a5426c20b9bfcc
6
+ metadata.gz: 9d9d8af70a4310d871a96fd34a789de3ce0df0ba4621cf237727fcc514dbbfb9fd3d26a35ae3df6fd9b6574752e290d4254bdea7f1622cadba99a07a6a870adf
7
+ data.tar.gz: e74cc7ca6db7fddc84c903715a44c70df47fb27f303ee1635579b89f47269fab168e9933582fef73269ad0e24fdeae97caa5c1924c57a0553242c33407f7492c
data/README.md CHANGED
@@ -1,5 +1,11 @@
1
1
  # algolia_html_extractor
2
2
 
3
+ [![Gem Version][1]](http://badge.fury.io/rb/algolia_html_extractor)
4
+ [![Build Status][2]](https://travis-ci.org/algolia/html-extractor)
5
+ [![Coverage Status][3]](https://coveralls.io/github/algolia/html-extractor?branch=master)
6
+ [![Code Climate][4]](https://codeclimate.com/github/algolia/html-extractor)
7
+ ![Ruby >= 2.3.0][5]
8
+
3
9
  This gem can convert HTML content into JSON records ready to be pushed to
4
10
  Algolia.
5
11
 
@@ -93,13 +99,13 @@ Each record has a `objectID` that uniquely identify it (computed by a hash of al
93
99
  the other values).
94
100
 
95
101
  It also contains the HTML tag name in `tag_name` (by default `<p>`
96
- paragraphs are extracted, but see the [settings][3] on how to change it).
102
+ paragraphs are extracted, but see the [settings][6] on how to change it).
97
103
 
98
104
  `html` contains the whole `outerContent` of the element, including the wrapping
99
105
  tags and inner children. The `text` attribute contains the textual content,
100
106
  stripping out all HTML.
101
107
 
102
- `node` contains the [Nokogiri node][4] instance. The lib uses it internally to
108
+ `node` contains the [Nokogiri node][7] instance. The lib uses it internally to
103
109
  extract all the relevant information but is also exposed if you want to process
104
110
  the node further.
105
111
 
@@ -109,7 +115,7 @@ Anchors are searched in `name` and `id` attributes of headings.
109
115
 
110
116
  `hierarchy` then contains a snapshot of the current heading hierarchy of the
111
117
  paragraph. The `lvlX` syntax is used to be compatible with the records
112
- [DocSearch][5] is using.
118
+ [DocSearch][8] is using.
113
119
 
114
120
  The `weight` attribute is used to provide an easy way to rank two records
115
121
  relative to each other.
@@ -142,7 +148,7 @@ and generic bug reports.
142
148
  ## Bug Reports and feature requests
143
149
 
144
150
  For any bug or ideas of new features, please start by checking in the
145
- [issues](https://github.com/pixelastic/html-hierarchy-extractor/issues) tab if
151
+ [issues][9] tab if
146
152
  it hasn't been discussed already. If not, feel free to open a new issue.
147
153
 
148
154
  ## Pull Requests
@@ -165,7 +171,7 @@ cp ./scripts/git_hooks/* ./.git/hooks
165
171
  This will add a `pre-commit` and `pre-push` scripts that will respectively check
166
172
  that all files are lint-free before committing, and pass all tests before
167
173
  pushing. If any of those two hooks give your errors, you should fix the code
168
- before commiting or pushing.
174
+ before committing or pushing.
169
175
 
170
176
  Having those steps helps keeping the codebase clean as much as possible, and
171
177
  avoid polluting discussion in PR about style.
@@ -182,7 +188,7 @@ Rubocop, and the configuration can be found in `.rubocop.yml`.
182
188
 
183
189
  ## Test
184
190
 
185
- `rake test` will run all the tests.
191
+ `rake test` will run all the tests.
186
192
 
187
193
  `rake coverage` will do the same, but also adding the code coverage files to
188
194
  `./coverage`. This should be useful in a CI environment.
@@ -210,8 +216,12 @@ This gem was previously named `html-hierarchy-extractor` but has been renamed to
210
216
  convention. That's also why this gem directly starts at v2.0.
211
217
 
212
218
 
213
- [1]: https://www.algolia.com/
214
- [2]: https://community.algolia.com/docsearch/
215
- [3]: #Settings
216
- [4]: http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node
217
- [5]: https://community.algolia.com/docsearch/
219
+ [1]: https://badge.fury.io/rb/algolia_html_extractor.svg
220
+ [2]: https://travis-ci.org/algolia/html-extractor.svg?branch=master
221
+ [3]: https://coveralls.io/repos/algolia/html-extractor/badge.svg?branch=master&service=github
222
+ [4]: https://codeclimate.com/github/algolia/html-extractor/badges/gpa.svg
223
+ [5]: https://img.shields.io/badge/ruby-%3E%3D%202.3.0-green.svg
224
+ [6]: #Settings
225
+ [7]: http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node
226
+ [8]: https://community.algolia.com/docsearch/
227
+ [9]: https://github.com/pixelastic/html-hierarchy-extractor/issues
@@ -118,12 +118,12 @@ class AlgoliaHTMLExtractor
118
118
  next unless node.matches?(@options[:css_selector])
119
119
 
120
120
  # Stop if node is empty
121
- text = extract_text(node)
122
- next if text.empty?
121
+ content = extract_text(node)
122
+ next if content.empty?
123
123
 
124
124
  item = {
125
125
  html: extract_html(node),
126
- text: text,
126
+ content: content,
127
127
  tag_name: extract_tag_name(node),
128
128
  hierarchy: current_hierarchy.clone,
129
129
  anchor: current_anchor,
@@ -1,5 +1,5 @@
1
1
  # Expose gem version
2
2
  # rubocop:disable Style/SingleLineMethods
3
3
  class AlgoliaHTMLExtractorVersion
4
- def self.to_s; '2.1.0' end
4
+ def self.to_s; '2.2.0' end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: algolia_html_extractor
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.0
4
+ version: 2.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tim Carry
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-11-10 00:00:00.000000000 Z
11
+ date: 2017-12-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print