algolia_html_extractor 2.1.0 → 2.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5a8828fb4ece535b803731889c5f8be758f06a7e
4
- data.tar.gz: 7b8b80ce73ddefeaa901e5483b2b4822ad7f35a3
3
+ metadata.gz: bbf8df27c69c4d6f2f16de4bd7cf18fcd703fb43
4
+ data.tar.gz: a01708af7fe1a3c42d364a099e443ac05f6f8a75
5
5
  SHA512:
6
- metadata.gz: 4e12dc7fc939f8d7551cc0f1637c394bd5794bc13521fc9c04dde418a238980956e74ebcede3e5317c39e0b652ea1e85c775dd8ac6be0dbd1f026005b915485b
7
- data.tar.gz: a894a352c8efc2a4214c1c0ae4088d143ae1b831e86edb57ea6da3012cfe21a3fb22f230ec5337b0d240feb75a309bd155301140103500c1d6a5426c20b9bfcc
6
+ metadata.gz: 9d9d8af70a4310d871a96fd34a789de3ce0df0ba4621cf237727fcc514dbbfb9fd3d26a35ae3df6fd9b6574752e290d4254bdea7f1622cadba99a07a6a870adf
7
+ data.tar.gz: e74cc7ca6db7fddc84c903715a44c70df47fb27f303ee1635579b89f47269fab168e9933582fef73269ad0e24fdeae97caa5c1924c57a0553242c33407f7492c
data/README.md CHANGED
@@ -1,5 +1,11 @@
1
1
  # algolia_html_extractor
2
2
 
3
+ [![Gem Version][1]](http://badge.fury.io/rb/algolia_html_extractor)
4
+ [![Build Status][2]](https://travis-ci.org/algolia/html-extractor)
5
+ [![Coverage Status][3]](https://coveralls.io/github/algolia/html-extractor?branch=master)
6
+ [![Code Climate][4]](https://codeclimate.com/github/algolia/html-extractor)
7
+ ![Ruby >= 2.3.0][5]
8
+
3
9
  This gem can convert HTML content into JSON records ready to be pushed to
4
10
  Algolia.
5
11
 
@@ -93,13 +99,13 @@ Each record has a `objectID` that uniquely identify it (computed by a hash of al
93
99
  the other values).
94
100
 
95
101
  It also contains the HTML tag name in `tag_name` (by default `<p>`
96
- paragraphs are extracted, but see the [settings][3] on how to change it).
102
+ paragraphs are extracted, but see the [settings][6] on how to change it).
97
103
 
98
104
  `html` contains the whole `outerContent` of the element, including the wrapping
99
105
  tags and inner children. The `text` attribute contains the textual content,
100
106
  stripping out all HTML.
101
107
 
102
- `node` contains the [Nokogiri node][4] instance. The lib uses it internally to
108
+ `node` contains the [Nokogiri node][7] instance. The lib uses it internally to
103
109
  extract all the relevant information but is also exposed if you want to process
104
110
  the node further.
105
111
 
@@ -109,7 +115,7 @@ Anchors are searched in `name` and `id` attributes of headings.
109
115
 
110
116
  `hierarchy` then contains a snapshot of the current heading hierarchy of the
111
117
  paragraph. The `lvlX` syntax is used to be compatible with the records
112
- [DocSearch][5] is using.
118
+ [DocSearch][8] is using.
113
119
 
114
120
  The `weight` attribute is used to provide an easy way to rank two records
115
121
  relative to each other.
@@ -142,7 +148,7 @@ and generic bug reports.
142
148
  ## Bug Reports and feature requests
143
149
 
144
150
  For any bug or ideas of new features, please start by checking in the
145
- [issues](https://github.com/pixelastic/html-hierarchy-extractor/issues) tab if
151
+ [issues][9] tab if
146
152
  it hasn't been discussed already. If not, feel free to open a new issue.
147
153
 
148
154
  ## Pull Requests
@@ -165,7 +171,7 @@ cp ./scripts/git_hooks/* ./.git/hooks
165
171
  This will add a `pre-commit` and `pre-push` scripts that will respectively check
166
172
  that all files are lint-free before committing, and pass all tests before
167
173
  pushing. If any of those two hooks give your errors, you should fix the code
168
- before commiting or pushing.
174
+ before committing or pushing.
169
175
 
170
176
  Having those steps helps keeping the codebase clean as much as possible, and
171
177
  avoid polluting discussion in PR about style.
@@ -182,7 +188,7 @@ Rubocop, and the configuration can be found in `.rubocop.yml`.
182
188
 
183
189
  ## Test
184
190
 
185
- `rake test` will run all the tests.
191
+ `rake test` will run all the tests.
186
192
 
187
193
  `rake coverage` will do the same, but also adding the code coverage files to
188
194
  `./coverage`. This should be useful in a CI environment.
@@ -210,8 +216,12 @@ This gem was previously named `html-hierarchy-extractor` but has been renamed to
210
216
  convention. That's also why this gem directly starts at v2.0.
211
217
 
212
218
 
213
- [1]: https://www.algolia.com/
214
- [2]: https://community.algolia.com/docsearch/
215
- [3]: #Settings
216
- [4]: http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node
217
- [5]: https://community.algolia.com/docsearch/
219
+ [1]: https://badge.fury.io/rb/algolia_html_extractor.svg
220
+ [2]: https://travis-ci.org/algolia/html-extractor.svg?branch=master
221
+ [3]: https://coveralls.io/repos/algolia/html-extractor/badge.svg?branch=master&service=github
222
+ [4]: https://codeclimate.com/github/algolia/html-extractor/badges/gpa.svg
223
+ [5]: https://img.shields.io/badge/ruby-%3E%3D%202.3.0-green.svg
224
+ [6]: #Settings
225
+ [7]: http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node
226
+ [8]: https://community.algolia.com/docsearch/
227
+ [9]: https://github.com/pixelastic/html-hierarchy-extractor/issues
@@ -118,12 +118,12 @@ class AlgoliaHTMLExtractor
118
118
  next unless node.matches?(@options[:css_selector])
119
119
 
120
120
  # Stop if node is empty
121
- text = extract_text(node)
122
- next if text.empty?
121
+ content = extract_text(node)
122
+ next if content.empty?
123
123
 
124
124
  item = {
125
125
  html: extract_html(node),
126
- text: text,
126
+ content: content,
127
127
  tag_name: extract_tag_name(node),
128
128
  hierarchy: current_hierarchy.clone,
129
129
  anchor: current_anchor,
@@ -1,5 +1,5 @@
1
1
  # Expose gem version
2
2
  # rubocop:disable Style/SingleLineMethods
3
3
  class AlgoliaHTMLExtractorVersion
4
- def self.to_s; '2.1.0' end
4
+ def self.to_s; '2.2.0' end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: algolia_html_extractor
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.0
4
+ version: 2.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tim Carry
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-11-10 00:00:00.000000000 Z
11
+ date: 2017-12-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print