nokogumbo 0.6 → 0.7

Sign up to get free protection for your applications and to get access to all the features.
Files changed (3) hide show
  1. data/README.md +7 -4
  2. data/lib/nokogumbo.rb +9 -1
  3. metadata +2 -2
data/README.md CHANGED
@@ -3,7 +3,8 @@ Nokogumbo - a Nokogiri interface to the Gumbo HTML5 parser.
3
3
 
4
4
  Nokogumbo provides the ability for a Ruby program to invoke the
5
5
  [Gumbo HTML5 parser](https://github.com/google/gumbo-parser#readme)
6
- and to access the result as a Nokogiri parsed document.
6
+ and to access the result as a
7
+ [Nokogiri::HTML::Document](http://nokogiri.org/Nokogiri/HTML/Document.html).
7
8
 
8
9
  Usage:
9
10
  -----
@@ -25,10 +26,12 @@ Notes:
25
26
 
26
27
  * The `Nokogiri::HTML5.parse` function takes a string and passes it to the
27
28
  <code>gumbo_parse_with_options</code> method, using the default options.
28
- The resulting Gumbo parse tree is the walked, producing a libxml2 parse tree.
29
+ The resulting Gumbo parse tree is the walked, producing a
30
+ [libxml2](http://xmlsoft.org/html/)
31
+ [xmlDoc](http://xmlsoft.org/html/libxml-tree.html#xmlDoc).
29
32
  The original Gumbo parse tree is then destroyed, and single Nokogiri Ruby
30
- object is constructed to wrap the libxml2 parse tree. Nokogiri only produces
31
- Ruby objects as necessary, so all scanning is done using the underlying
33
+ object is constructed to wrap the xmlDoc structure. Nokogiri only produces
34
+ Ruby objects as necessary, so all searching is done using the underlying
32
35
  libxml2 libraries.
33
36
 
34
37
  * The `Nokogiri::HTML5.get` function takes care of following redirects,
data/lib/nokogumbo.rb CHANGED
@@ -2,11 +2,15 @@ require 'nokogiri'
2
2
  require 'nokogumboc'
3
3
 
4
4
  module Nokogiri
5
+ # Parse an HTML document. +string+ contains the document. +string+
6
+ # may also be an IO-like object. Returns a +Nokogiri::HTML::Document+.
5
7
  def self.HTML5(string)
6
8
  Nokogiri::HTML5.parse(string)
7
9
  end
8
10
 
9
11
  module HTML5
12
+ # Parse an HTML document. +string+ contains the document. +string+
13
+ # may also be an IO-like object. Returns a +Nokogiri::HTML::Document+.
10
14
  def self.parse(string)
11
15
  if string.respond_to? :read
12
16
  string = string.read
@@ -20,6 +24,10 @@ module Nokogiri
20
24
  Nokogumbo.parse(string)
21
25
  end
22
26
 
27
+ # Fetch and parse a HTML document from the web, following redirects,
28
+ # handling https, and determining the character encoding using HTML5
29
+ # rules. +uri+ may be a +String+ or a +URI+. +limit+ controls the
30
+ # number of redirects that will be followed.
23
31
  def self.get(uri, limit=10)
24
32
  require 'net/http'
25
33
  uri = URI(uri) unless URI === uri
@@ -82,7 +90,7 @@ module Nokogiri
82
90
  if not encoding
83
91
  data = body[0..1023].gsub(/<!--.*?(-->|\Z)/m, '')
84
92
  data.scan(/<meta.*?>/m).each do |meta|
85
- encoding ||= meta[/charset="?(.*?)($|"|\s|>)/im, 1]
93
+ encoding ||= meta[/charset=["']?([^>]*?)($|["'\s>])/im, 1]
86
94
  end
87
95
  end
88
96
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: nokogumbo
3
3
  version: !ruby/object:Gem::Version
4
- version: '0.6'
4
+ version: '0.7'
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-08-22 00:00:00.000000000 Z
12
+ date: 2013-08-25 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: nokogiri