RubyGems - html-hierarchy-extractor - Versions diffs - 1.0.2 → 1.0.9 - Mend

html-hierarchy-extractor 1.0.2 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

checksums.yaml +4 -4
metadata +45 -48
data/.coveralls.yml +0 -1
data/.document +0 -5
data/.rspec +0 -2
data/.rubocop.yml +0 -26
data/.travis.yml +0 -12
data/CONTRIBUTING.md +0 -53
data/Gemfile +0 -16
data/Guardfile +0 -7
data/LICENSE.txt +0 -20
data/README.md +0 -141
data/Rakefile +0 -58
data/VERSION +0 -1
data/html-hierarchy-extractor.gemspec +0 -99
data/lib/html-hierarchy-extractor.rb +0 -144
data/lib/version.rb +0 -6
data/scripts/bump_version +0 -47
data/scripts/check_flay +0 -30
data/scripts/check_flog +0 -31
data/scripts/coverage +0 -3
data/scripts/git_hooks/pre-commit +0 -16
data/scripts/git_hooks/pre-push +0 -9
data/scripts/lint +0 -2
data/scripts/release +0 -13
data/scripts/test +0 -4
data/scripts/test_ci +0 -7
data/scripts/watch +0 -4
data/spec/html_hierarchy_extractor_spec.rb +0 -441
data/spec/spec_helper.rb +0 -14
data/spec/spec_helper_simplecov.rb +0 -9

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 90e26530c9a5d82ec576d31614a5107f0defa763
-  data.tar.gz: ada6d22330888e48f4a5d274568e93552f964867
+  metadata.gz: 3f602c4f7bbce46be9791ed69ecbd2797dad12af
+  data.tar.gz: 925fab69f43102eb41a5ce2014be43c0145da96a
 SHA512:
-  metadata.gz: 965435216e5844e62c248bff8b0e4aee5907094ba370ea642e4eedac0901e4b3f1ca3cf35cd2753d4c276a7032a2575ec2ae78786b7b645fd12c2a4ec26d6ddf
-  data.tar.gz: e8a74ad80c0dac98abcb5a1e8ac07315389d3644bf0cf992af90e256bbc5feb2a10ac680681f07f9a7fc030c5e268248df425d40f20d2fc9d8a543030e0ae047
+  metadata.gz: c988218206a7a3461eddbf929c48dce69db4d1034c193106a93790b502c597f12eb60f620c523c67bbd720545c4f97dd4a74a6a118198569b7071b0e6d253a56
+  data.tar.gz: 958f6df05ed28f5b7bd5307eb930fa7ef362e2770ca4258594ad3d695edf9f4f19f097f792a1ab1d98bbef1350de16fd92e0ff2897b43016621e0b50f9659e8d

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: html-hierarchy-extractor
 version: !ruby/object:Gem::Version
-  version: 1.0.2
+  version: 1.0.9
 platform: ruby
 authors:
 - Tim Carry
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-07-20 00:00:00.000000000 Z
+date: 2017-11-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: awesome_print
@@ -30,42 +30,42 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.8'
+        version: '2.0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.8'
+        version: '2.0'
 - !ruby/object:Gem::Dependency
   name: nokogiri
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.6'
+        version: '1.8'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.6'
+        version: '1.8'
 - !ruby/object:Gem::Dependency
   name: coveralls
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.8'
+        version: 0.8.21
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.8'
+        version: 0.8.21
 - !ruby/object:Gem::Dependency
   name: flay
   requirement: !ruby/object:Gem::Requirement
@@ -94,6 +94,34 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '4.3'
+- !ruby/object:Gem::Dependency
+  name: guard
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.14'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.14'
+- !ruby/object:Gem::Dependency
+  name: guard-rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
 - !ruby/object:Gem::Dependency
   name: guard-rspec
   requirement: !ruby/object:Gem::Requirement
@@ -142,68 +170,37 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.31'
+        version: '0.51'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.31'
+        version: '0.51'
 - !ruby/object:Gem::Dependency
   name: simplecov
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.10'
+        version: 0.14.1
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.10'
+        version: 0.14.1
 description: Take any arbitrary HTML as input and extract its hierarchy as a list
-  of items, including parents and contents.It is primarily intended to be used along
+  of items, including parents and contents. It is primarily intended to be used along
   with Algolia, to improve the relevance of searching into huge chunks of text
 email: tim@pixelastic.com
 executables: []
 extensions: []
-extra_rdoc_files:
-- LICENSE.txt
-- README.md
-files:
-- ".coveralls.yml"
-- ".document"
-- ".rspec"
-- ".rubocop.yml"
-- ".travis.yml"
-- CONTRIBUTING.md
-- Gemfile
-- Guardfile
-- LICENSE.txt
-- README.md
-- Rakefile
-- VERSION
-- html-hierarchy-extractor.gemspec
-- lib/html-hierarchy-extractor.rb
-- lib/version.rb
-- scripts/bump_version
-- scripts/check_flay
-- scripts/check_flog
-- scripts/coverage
-- scripts/git_hooks/pre-commit
-- scripts/git_hooks/pre-push
-- scripts/lint
-- scripts/release
-- scripts/test
-- scripts/test_ci
-- scripts/watch
-- spec/html_hierarchy_extractor_spec.rb
-- spec/spec_helper.rb
-- spec/spec_helper_simplecov.rb
-homepage: http://github.com/pixelastic/html-hierarchy-extractor
+extra_rdoc_files: []
+files: []
+homepage: https://github.com/pixelastic/html-hierarchy-extractor
 licenses:
 - MIT
 metadata: {}
@@ -223,7 +220,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.4.8
+rubygems_version: 2.5.1
 signing_key:
 specification_version: 4
 summary: Extract HTML hierarchy (headings and content) into a list of items

data/.coveralls.yml DELETED

	@@ -1 +0,0 @@
1	- service_name: travis-ci

data/.document DELETED

@@ -1,5 +0,0 @@
-lib/**/*.rb
-bin/*
--
-features/**/*.feature
-LICENSE.txt

data/.rspec DELETED

	@@ -1,2 +0,0 @@
1	- --color
2	- --format progress

data/.rubocop.yml DELETED

@@ -1,26 +0,0 @@
-# Defaults:
-# https://github.com/bbatsov/rubocop/blob/master/config/default.yml
-Metrics/AbcSize:
-  Max: 100
-Metrics/ClassLength:
-  Max: 200
-Metrics/ModuleLength:
-  Max: 200
-Metrics/MethodLength:
-  Max: 50
-Metrics/CyclomaticComplexity:
-  Max: 10
-Metrics/PerceivedComplexity:
-  Max: 10
-Style/FileName:
-  Enabled: false
-Style/MultilineOperationIndentation:
-  Enabled: false

data/.travis.yml DELETED

@@ -1,12 +0,0 @@
-language: ruby
-cache: bundler
-before_script: bundle update
-script: ./scripts/test_ci
-rvm:
- - 2.2
- - 2.1
- - 2.0
-notifications:
-  email:
-    on_success: never
-    on_failure: never

data/CONTRIBUTING.md DELETED

@@ -1,53 +0,0 @@
-Hi collaborator!
-If you have a fix or a new feature, please start by checking in the
-[issues](https://github.com/pixelastic/html-hierarchy-extractor/issues) if it is
-already referenced. If not, feel free to open one.
-We use [pull requests](https://github.com/pixelastic/html-hierarchy-extractor/pulls)
-for collaboration. The workflow is as follow:
-- Create a local branch, starting from `develop`
-- Submit the PR on `develop`
-- Wait for review
-- Do the changes requested (if any)
-- We may ask you to rebase the branch to latest `develop` if it gets out of sync
-- Get praise for your awesome contribution
-# Development workflow
-Run `bundle install` to get all dependencies up to date.
-You can then launch:
-- `./scripts/test` to launch tests
-- `./scripts/watch` to start a test watcher (for TDD) using Guard
-If you plan on submitting a PR, I suggest you install the git hooks. This will
-run pre-commit and pre-push checks. Those checks will also be run by TravisCI,
-but running them locally gives faster feedback.
-If you want to a local version of the gem in your local project, I suggest
-updating your project `Gemfile` to point to the correct local directory
-```ruby
-gem "html-hierarchy-extractor", :path => "/path/to/local/gem/folder"
-```
-You should also run `rake gemspec` from the `html-hierarchy-extractor`
-repository the first time and if you added/deleted any file or dependency.
-# Tagging and releasing
-This part is for main contributors:
-```
-# Bump the version (in develop)
-./scripts/bump_version minor
-# Update master and release
-./scripts/release
-# Install the gem locally (optional)
-rake install
-```

data/Gemfile DELETED

@@ -1,16 +0,0 @@
-source 'http://rubygems.org'
-gem 'awesome_print', '~> 1.6'
-gem 'json', '~> 1.8'
-gem 'nokogiri', '~> 1.6'
-group :development do
-  gem 'coveralls', '~> 0.8'
-  gem 'flay', '~> 2.6'
-  gem 'flog', '~> 4.3'
-  gem 'guard-rspec', '~> 4.6'
-  gem 'jeweler', '~> 2.0'
-  gem 'rspec', '~> 3.0'
-  gem 'rubocop', '~> 0.31'
-  gem 'simplecov', '~> 0.10'
-end

data/Guardfile DELETED

@@ -1,7 +0,0 @@
-guard :rspec, cmd: 'bundle exec rspec --color --format documentation' do
-  watch(%r{^spec/.+_spec\.rb$})
-  watch(%r{^lib/(.+)\.rb$})     { |m| "spec/#{m[1]}_spec.rb" }
-  watch('spec/spec_helper.rb')  { 'spec' }
-end
-notification :off

data/LICENSE.txt DELETED

@@ -1,20 +0,0 @@
-Copyright (c) 2016 Pixelastic
-Permission is hereby granted, free of charge, to any person obtaining
-a copy of this software and associated documentation files (the
-"Software"), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
-LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
-OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md DELETED

@@ -1,141 +0,0 @@
-# html-hierarchy-extractor
-This gems lets you extract the hierarchy of headings and content from any HTML
-page into an array of elements.
-Intended to be used with [Algolia][1] to improve relevance of search
-results inside large HTML pages. The records created are compatible with the
-[DocSearch][2] format.
-## Installation
-```ruby
-# Gemfile
-source 'http://rubygems.org'
-gem 'html-hierarchy-extractor', '~> 1.0'
-```
-## How to use
-```ruby
-require 'html-hierarchy-extractor'
-content = File.read('./index.html')
-page = HTMLHierarchyExtractor.new(content)
-records = page.extract
-puts records
-```
-## Records
-`extract` will return an array of records. Each record will represent a `<p>`
-paragraph of the initial text, along with it textual version (HTML removed),
-heading hierarchy, and other interesting bits.
-## Example
-Let's take the following HTML as input and see what records we got as output:
-```html
-<!doctype html>
-<html>
-<body>
-  <h1 name="journey">The Hero's Journey</h1>
-  <p>Most stories always follow the same pattern.</p>
-  <h2 name="departure">Part One: Departure</h2>
-  <p>A story starts in a mundane world, and helps identify the hero. It helps puts all the achievements of the story into perspective.</p>
-  <h3 name="calladventure">The call to Adventure</h3>
-  <p>Some out-of-the-ordinary event pushes the hero to start his journey.</p>
-  <h3 name="threshold">Crossing the Threshold</h3>
-  <p>The hero quits his job, hit the road, or whatever cuts him from his previous life.</p>
-  <h2 name="initiations">Part Two: Initiation</h2>
-  <h3 name="trials">The Road of Trials</h3>
-  <p>The road is filled with dangers. The hero as to find his inner strength to overcome them.</p>
-  <h3 name="ultimate">The Ultimate Boon</h3>
-  <p>The hero has found something, either physical or metaphorical that changes him.</p>
-  <h2 name="return">Part Three: Return</h2>
-  <h3 name="refusal">Refusal to Return</h3>
-  <p>The hero does not want to go back to his previous life at first. But then, an event will make him change his mind.</p>
-  <h3 name="master">Master of Two Worlds</h3>
-  <p>Armed with his new power/weapon, the hero can go back to its initial world and fix all the issues he had there.</p>
-</body>
-</html>
-```
-Here is one of the records extracted:
-```ruby
-{
-  :uuid => "1f5923d5a60e998704f201bbe9964811",
-  :tag_name => "p",
-  :html => "<p>The hero quit his jobs, hit the road, or whatever cuts him from his previous life.</p>",
-  :text => "The hero quit his jobs, hit the road, or whatever cuts him from his previous life.",
-  :node => #<Nokogiri::XML::Element:0x11a5850 name="p">,
-  :anchor => nil,
-  :hierarchy => {
-    :lvl0 => "The Hero's Journey",
-    :lvl1 => "Part One: Departure",
-    :lvl2 => "Crossing the Threshold",
-    :lvl3 => nil,
-    :lvl4 => nil,
-    :lvl5 => nil,
-    :lvl6 => nil
-  },
-  :weight => {
-    :heading => 70,
-    :position => 3
-  }
-}
-```
-Each record has a `uuid` that uniquely identify it (computed by a hash of all
-the other values).
-It also contains the HTML tag name in `tag_name` (by default `<p>`
-paragraphs are extracted, but see the [settings][3] on how to change it).
-`html` contains the whole `outerContent` of the element, including the wrapping
-tags and inner children. The `text` attribute contains the textual content,
-stripping out all HTML.
-`node` contains the [Nokogiri node][4] instance. The lib uses it internally to
-extract all the relevant information ut is also exposed if you want to process
-the node further.
-The `anchor` attributes contains the HTML anchor closest to the element. Here it
-is `threshold` because this is the closest anchor in the hierarchy above.
-Anchors are searched in `name` and `id` attributes of headings.
-`hierarchy` then contains a snapshot of the current heading hierarchy of the
-paragraph. The `lvlX` syntax is used to be compatible with the records
-[DocSearch][5] is using.
-The `weight` attribute is used to provide an easy way to rank two records
-relative to each other.
-- `heading` gives the depth level in the hierarchy where the record is. Records
-  on top level will have a value of 100, those under a `h1` will have 90, and so
-  on. Because our record is under a `h3`, it has 70.
-- `position` is the position of the paragraph in the page. Here our paragraph is
-  the fourth paragraph of the page, so it will have a `position` of 3. It can
-  help you give more weight to the first items in the page.
-## Settings
-When instanciating `HTMLHierarchyExtractor`, you can pass a secondary `options`
-argument. This attribute accepts one value, `css_selector`.
-```ruby
-page = HTMLHierarchyExtractor.new(content, { css_selector: 'p,li' })
-```
-This lets you change the default selector. Here instead of `<p>` paragraph,
-the library will extract `<li>` list elements as well.
-[1]: https://www.algolia.com/
-[2]: https://community.algolia.com/docsearch/
-[3]: #Settings
-[4]: http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node
-[5]: https://community.algolia.com/docsearch/