algolia_html_extractor 2.2.2 → 2.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +21 -19
- data/lib/algolia_html_extractor.rb +1 -1
- data/lib/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2935d48da689f8b8064f82fe45d08294973b537b
|
4
|
+
data.tar.gz: e4eaf98057448f77c4ffa4472e479d8f331ccf48
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2b97bd8f04d84d81bf5adbc4063163c43e3b5c53f2cfa8297b3fac07d0b3542ea3ca135068dbf9e63b3444b5604a2cd34645f79328491b14c7445eeafc897ab8
|
7
|
+
data.tar.gz: 1c7e7f1edfb75945ef179dffb5138f3fdb408273653e2c2612fcd7ff73c5d86eebf68aa1911bc65d4974ab867667b36a9d9e26a672ff50d94981e5eae6f88fbc
|
data/README.md
CHANGED
@@ -1,10 +1,11 @@
|
|
1
1
|
# algolia_html_extractor
|
2
2
|
|
3
|
-
[![
|
4
|
-
|
5
|
-
[![
|
6
|
-
[![
|
7
|
-
![
|
3
|
+
[![gem version][1]](https://rubygems.org/gems/algolia_html_extractor)
|
4
|
+
![ruby][2]
|
5
|
+
[![build master][3]](https://travis-ci.org/algolia/html-extractor)
|
6
|
+
[![coverage master][4]](https://coveralls.io/github/algolia/html-extractor?branch=master)
|
7
|
+
[![build develop][5]](https://travis-ci.org/algolia/html-extractor)
|
8
|
+
[![coverage develop][6]](https://coveralls.io/github/algolia/html-extractor?branch=develop)
|
8
9
|
|
9
10
|
This gem can convert HTML content into JSON records ready to be pushed to
|
10
11
|
Algolia.
|
@@ -88,7 +89,7 @@ Here is one of the records extracted:
|
|
88
89
|
:lvl5 => nil,
|
89
90
|
:lvl6 => nil
|
90
91
|
},
|
91
|
-
:
|
92
|
+
:custom_ranking => {
|
92
93
|
:heading => 70,
|
93
94
|
:position => 3
|
94
95
|
}
|
@@ -99,13 +100,13 @@ Each record has a `objectID` that uniquely identify it (computed by a hash of al
|
|
99
100
|
the other values).
|
100
101
|
|
101
102
|
It also contains the HTML tag name in `tag_name` (by default `<p>`
|
102
|
-
paragraphs are extracted, but see the [settings][
|
103
|
+
paragraphs are extracted, but see the [settings][7] on how to change it).
|
103
104
|
|
104
105
|
`html` contains the whole `outerContent` of the element, including the wrapping
|
105
106
|
tags and inner children. The `text` attribute contains the textual content,
|
106
107
|
stripping out all HTML.
|
107
108
|
|
108
|
-
`node` contains the [Nokogiri node][
|
109
|
+
`node` contains the [Nokogiri node][8] instance. The lib uses it internally to
|
109
110
|
extract all the relevant information but is also exposed if you want to process
|
110
111
|
the node further.
|
111
112
|
|
@@ -115,9 +116,9 @@ Anchors are searched in `name` and `id` attributes of headings.
|
|
115
116
|
|
116
117
|
`hierarchy` then contains a snapshot of the current heading hierarchy of the
|
117
118
|
paragraph. The `lvlX` syntax is used to be compatible with the records
|
118
|
-
[DocSearch][
|
119
|
+
[DocSearch][9] is using.
|
119
120
|
|
120
|
-
The `
|
121
|
+
The `custom_ranking` attribute is used to provide an easy way to rank two records
|
121
122
|
relative to each other.
|
122
123
|
|
123
124
|
- `heading` gives the depth level in the hierarchy where the record is. Records
|
@@ -148,7 +149,7 @@ and generic bug reports.
|
|
148
149
|
## Bug Reports and feature requests
|
149
150
|
|
150
151
|
For any bug or ideas of new features, please start by checking in the
|
151
|
-
[issues][
|
152
|
+
[issues][10] tab if
|
152
153
|
it hasn't been discussed already. If not, feel free to open a new issue.
|
153
154
|
|
154
155
|
## Pull Requests
|
@@ -217,11 +218,12 @@ convention. That's also why this gem directly starts at v2.0.
|
|
217
218
|
|
218
219
|
|
219
220
|
[1]: https://badge.fury.io/rb/algolia_html_extractor.svg
|
220
|
-
[2]: https://
|
221
|
-
[3]: https://
|
222
|
-
[4]: https://
|
223
|
-
[5]: https://img.shields.io/badge/
|
224
|
-
[6]:
|
225
|
-
[7]:
|
226
|
-
[8]:
|
227
|
-
[9]: https://
|
221
|
+
[2]: https://img.shields.io/badge/ruby-%3E%3D%202.3.0-green.svg
|
222
|
+
[3]: https://img.shields.io/badge/dynamic/json.svg?label=build%3Amaster&query=value&uri=https%3A%2F%2Fimg.shields.io%2Ftravis%2Falgolia%2Fhtml-extractor.json%3Fbranch%3Dmaster
|
223
|
+
[4]: https://img.shields.io/badge/dynamic/json.svg?label=coverage%3Amaster&colorB=&prefix=&suffix=%25&query=$.covered_percent&uri=https%3A%2F%2Fcoveralls.io%2Fgithub%2Falgolia%2Fhtml-extractor.json%3Fbranch%3Dmaster
|
224
|
+
[5]: https://img.shields.io/badge/dynamic/json.svg?label=build%3Adevelop&query=value&uri=https%3A%2F%2Fimg.shields.io%2Ftravis%2Falgolia%2Fhtml-extractor.json%3Fbranch%3Ddevelop
|
225
|
+
[6]: https://img.shields.io/badge/dynamic/json.svg?label=coverage%3Adevelop&colorB=&prefix=&suffix=%25&query=$.covered_percent&uri=https%3A%2F%2Fcoveralls.io%2Fgithub%2Falgolia%2Fhtml-extractor.json%3Fbranch%3Ddevelop
|
226
|
+
[7]: #Settings
|
227
|
+
[8]: http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node
|
228
|
+
[9]: https://community.algolia.com/docsearch/
|
229
|
+
[10]: https://github.com/pixelastic/html-hierarchy-extractor/issues
|
data/lib/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: algolia_html_extractor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tim Carry
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-03-12 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: awesome_print
|