diffbot 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1 @@
1
+ /pkg
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2012 Nicolás Sanguinetti for Tinder Inc.
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,125 @@
1
+ # Diffbot
2
+
3
+ This is a ruby client for the [Diffbot](http://diffbot.com) API.
4
+
5
+ ## Global Options
6
+
7
+ You can pass some settings to Diffbot like this:
8
+
9
+ ``` ruby
10
+ Diffbot.configure do |config|
11
+ config.token = ENV["DIFFBOT_TOKEN"]
12
+ config.instrumentor = ActiveSupport::Notifications
13
+ end
14
+ ```
15
+
16
+ The list of supported settings is:
17
+
18
+ * `token`: Your Diffbot API token. This will be used for all requests in which
19
+ you don't specify it manually (see below).
20
+ * `instrumentor`: An object that matches the [ActiveSupport::Notifications][1]
21
+ API, which will be used to trace network events. None is used by default.
22
+ * `article_defaults`: Pass a block to this method to configure the global
23
+ request settings used for Diffbot::Article requests. See below the options
24
+ supported.
25
+
26
+ [1]: http://api.rubyonrails.org/classes/ActiveSupport/Notifications.html
27
+
28
+ ## Articles
29
+
30
+ In order to fetch an article, do this:
31
+
32
+ ``` ruby
33
+ require "diffbot/article"
34
+
35
+ article = Diffbot::Article.fetch(article_url, diffbot_token)
36
+
37
+ # Now you can inspect the result:
38
+ article.title
39
+ article.author
40
+ article.date
41
+ article.text
42
+ # etc. See below for the full list of available response attributes.
43
+ ```
44
+
45
+ This is a list of all the fields returned by the `Diffbot::Article.fetch` call:
46
+
47
+ * `url`: The URL of the article.
48
+ * `title`: The title of the article.
49
+ * `author`: The author of the article.
50
+ * `date`: The date in which this article was published.
51
+ * `media`: A list of media items attached to this article.
52
+ * `text`: The body of the article. This will be plain text unless you specify
53
+ the HTML option in the request.
54
+ * `tags`: A list of tags/keywords extracted from the article.
55
+ * `xpath`: The XPath at which this article was found in the page.
56
+
57
+ ### Options
58
+
59
+ You can customize your request like this:
60
+
61
+ ``` ruby
62
+ article = Diffbot::Article.fetch(article_url, diffbot_token) do |request|
63
+ request.html = true # Return HTML instead of plain text.
64
+ request.dont_strip_ads = true # Leave any inline ads within the article.
65
+ request.tags = true # Generate ads for the article.
66
+ request.comments = true # Extract the comments from the article as well.
67
+ request.summary = true # Return a summary text instead of the full text.
68
+ request.stats = true # Return performance, probabilistic scoring stats.
69
+ end
70
+ ```
71
+
72
+ ## Frontpages
73
+
74
+ In order to fetch and analyze a front page, do this:
75
+
76
+ ``` ruby
77
+ require "diffbot/frontpage"
78
+
79
+ frontpage = Diffbot::Frontpage.fetch(url, diffbot_token)
80
+
81
+ # Results are available in the returned object:
82
+ frontpage.title
83
+ frontpage.icon
84
+ frontpage.items #=> An array of Diffbot::Item instances
85
+ ```
86
+
87
+ The fields you can extract from a Frontpage are:
88
+
89
+ * `title`: The title of the page.
90
+ * `icon`: The favicon of the page.
91
+ * `source_type`: What kind of page this is.
92
+ * `source_url`: The URL of the page.
93
+ * `items`: The list of `Diffbot::Item` representing each item on the page.
94
+
95
+ The instances of `Diffbot::Item` have the following fields:
96
+
97
+ * `id`: Unique identifier for this item.
98
+ * `title`: Title of the item.
99
+ * `link`: Extracted permalink of the item (if applicable).
100
+ * `description`: innerHTML content of the item.
101
+ * `summary`: A plain-text summary of the item.
102
+ * `pub_date`: Date when item was detected on page.
103
+ * `type`: The type of item, according to Diffbot. One of: `IMAGE`, `LINK`,
104
+ `STORY`, `CHUNK`.
105
+ * `img`: The main image extracted from this item.
106
+ * `xroot`: XPath of where the item was found on the page.
107
+ * `cluster`: XPath of the cluster of items where this item was found.
108
+ * `stats`: An object with the following attributes:
109
+ * `spam_score`: A Float between 0.0 and 1.0 indicating the probability this
110
+ item is spam/an advertisement.
111
+ * `static_rank`: A Float between 1.0 and 5.0 indicating the quality score of
112
+ the item.
113
+ * `fresh`: The percentage of the item that has changed compared to the
114
+ previous crawl.
115
+
116
+ ## TODO
117
+
118
+ * Implement the Follow API.
119
+ * Add tests for Article and Frontpage requests.
120
+ * Add a Frontpage.crawl method that given the URL of a frontpage, it will fetch
121
+ the article for each item in the page.
122
+
123
+ ## License
124
+
125
+ This is published under an MIT License, see LICENSE for further details.
@@ -0,0 +1,15 @@
1
+ require "rake/testtask"
2
+ require "rubygems/package_task"
3
+
4
+ gem_spec = eval(File.read("./diffbot.gemspec")) rescue nil
5
+ Gem::PackageTask.new(gem_spec) do |pkg|
6
+ pkg.need_zip = false
7
+ pkg.need_tar = false
8
+ end
9
+
10
+ Rake::TestTask.new do |t|
11
+ t.pattern = "test/*_test.rb"
12
+ t.verbose = true
13
+ end
14
+
15
+ task default: :test
@@ -0,0 +1,19 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = "diffbot"
3
+ s.version = "0.1.0"
4
+ s.description = "Diffbot provides a concise API for analyzing and extracting semantic information from web pages using Diffbot (http://www.diffbot.com)."
5
+ s.summary = "Ruby interface to the Diffbot API "
6
+ s.authors = ["Nicolas Sanguinetti"]
7
+ s.email = "hi@nicolassanguinetti.info"
8
+ s.homepage = "http://github.com/tinder/diffbot"
9
+ s.has_rdoc = false
10
+ s.files = `git ls-files`.split "\n"
11
+ s.platform = Gem::Platform::RUBY
12
+
13
+ s.add_dependency("excon")
14
+ s.add_dependency("yajl-ruby")
15
+ s.add_dependency("nokogiri")
16
+ s.add_dependency("hashie")
17
+
18
+ s.add_development_dependency("minitest")
19
+ end
@@ -0,0 +1,45 @@
1
+ require "hashie/trash"
2
+ require "diffbot/coercible_hash"
3
+ require "diffbot/request"
4
+ require "diffbot/article"
5
+ require "diffbot/frontpage"
6
+
7
+ module Diffbot
8
+ # Public: Set global options. This is a nice API to group calls to the Diffbot
9
+ # module.
10
+ #
11
+ # Yields the Diffbot module so you can set options on it.
12
+ #
13
+ # Returns self.
14
+ def self.configure
15
+ yield self
16
+ self
17
+ end
18
+
19
+ # Public: Configure the default request parameters for Article requests. See
20
+ # Article::RequestParams documentation for the specific configuration values
21
+ # you can set.
22
+ #
23
+ # Yields the default Article::RequestParams object.
24
+ #
25
+ # Returns the default Article::RequestParams object.
26
+ def self.article_defaults
27
+ if block_given?
28
+ @article_defaults = Article::RequestParams.new
29
+ yield @article_defaults
30
+ else
31
+ @article_defaults ||= Article::RequestParams.new
32
+ end
33
+
34
+ @article_defaults
35
+ end
36
+
37
+ class << self
38
+ # Public: Your Diffbot API token.
39
+ attr_accessor :token
40
+
41
+ # Public: The object used for network instrumentation. Must match
42
+ # ActiveSupport::Notifications API.
43
+ attr_accessor :instrumentor
44
+ end
45
+ end
@@ -0,0 +1,194 @@
1
+ require "yajl"
2
+ require "diffbot"
3
+ require "diffbot/coercible_hash"
4
+
5
+ module Diffbot
6
+ # Representation of an article (ie a blog post or similar). This class offers
7
+ # a single entry point: the `.fetch` method, that, given a URL, will return
8
+ # the article as analyzed by Diffbot.
9
+ class Article < Hashie::Trash
10
+ extend CoercibleHash
11
+
12
+ # Public: Fetch an article from a URL.
13
+ #
14
+ # url - The article URL.
15
+ # token - The API token for Diffbot.
16
+ # parser - The callable object that will parse the raw output from the
17
+ # API. Defaults to Yajl::Parser.method(:parse).
18
+ # defaults - The default request options. See Diffbot.article_defaults.
19
+ #
20
+ # Yields the request configuration.
21
+ #
22
+ # Examples
23
+ #
24
+ # # Request an article with the default options.
25
+ # article = Diffbot::Article.fetch(url, api_token)
26
+ #
27
+ # # Pass options to the request. See Diffbot::Article::RequestParams to
28
+ # # see the available configuration options.
29
+ # article = Diffbot::Article.fetch(url, api_token) do |req|
30
+ # req.html = true
31
+ # end
32
+ #
33
+ # Returns a Diffbot::Article.
34
+ def self.fetch(url, token=Diffbot.token, parser=Yajl::Parser.method(:parse), defaults=Diffbot.article_defaults)
35
+ params = defaults.dup
36
+ yield params if block_given?
37
+
38
+ request = Diffbot::Request.new(token)
39
+ response = request.perform(:get, endpoint, params) do |req|
40
+ req[:query][:url] = url
41
+ end
42
+
43
+ new(parser.call(response.body))
44
+ end
45
+
46
+ # The API endpoint where requests should be made.
47
+ #
48
+ # Returns a URL.
49
+ def self.endpoint
50
+ "http://www.diffbot.com/api/article"
51
+ end
52
+
53
+ # Public: URL of the article.
54
+ property :url
55
+
56
+ # Public: Title of the article.
57
+ property :title
58
+
59
+ # Public: Author (or Authors) ofthe article.
60
+ property :author
61
+
62
+ # Public: Date of the article (as a string).
63
+ property :date
64
+
65
+ class MediaItem < Hashie::Trash
66
+ property :type
67
+ property :link
68
+ property :primary, default: false
69
+ end
70
+
71
+ # Public: List of media items related to the articles. Each item is an
72
+ # object with the following attributes:
73
+ #
74
+ # type - Either `"image"` or `"video"`.
75
+ # link - The URL of the given media resource.
76
+ # primary - Only present in one of the items. This is assumed to be the most
77
+ # representative media for this article.
78
+ property :media
79
+ coerce_property :media, collection: MediaItem
80
+
81
+ # Public: The raw text of the article, without formatting.
82
+ property :text
83
+
84
+ # Public: The contents of the article in HTML, stripped of any ads or other
85
+ # chunks of HTML which are considered unrelated by Diffbot, unless you set
86
+ # the `dont_strip_ads` option in the request.
87
+ #
88
+ # Only present if you set `html` to true in the request.
89
+ property :html
90
+
91
+ # Public: A summary line for this article.
92
+ #
93
+ # Only present if you set `summary` to true in the request.
94
+ property :summary
95
+
96
+ # Public: A list of tags related to this article.
97
+ #
98
+ # Only present if you set `tags` to true in the request.
99
+ property :tags
100
+
101
+ # Public: The favicon of the page where this article was extracted from.
102
+ property :icon
103
+
104
+ class Stats < Hashie::Trash
105
+ property :fetch_time, from: :fetchTime
106
+ property :confidence
107
+ end
108
+
109
+ # Public: Returns an object with the following attributes:
110
+ #
111
+ # fetch_time - The time of the request, in ms.
112
+ # confidence - The confidence of Diffbot that the returned text is really
113
+ # the text of the article. Between 0.0 and 1.0.
114
+ #
115
+ # Only present if you set `stats` to true in the request.
116
+ property :stats
117
+ coerce_property :stats, class: Stats
118
+
119
+ # Public: The XPath selector at which the body of the article was found in
120
+ # the page.
121
+ property :xpath
122
+
123
+ # Public: If there was an error in the request, this will contain the error
124
+ # message.
125
+ property :error
126
+
127
+ # Public: If there was an error in the request, this will contain the error
128
+ # code.
129
+ property :error_code, from: :errorCode
130
+
131
+ # This represents the parameters you can pass to Diffbot to configure a
132
+ # given request. These are either set globally with Diffbot.article_defaults
133
+ # or on a request basis by passing a block to Diffbot::Article.fetch.
134
+ #
135
+ # Example:
136
+ #
137
+ # # All article requests will include the HTML and tags.
138
+ # Diffbot.configure do |config|
139
+ # config.article_defaults do |defaults|
140
+ # defaults.html = true
141
+ # defaults.tags = true
142
+ # end
143
+ # end
144
+ #
145
+ # # This article request will *also* include the summary.
146
+ # Diffbot::Article.fetch(url, token) do |req|
147
+ # req.summary = true
148
+ # end
149
+ class RequestParams < Hashie::Trash
150
+ # Public: Set to true to return HTML instead of plain-text.
151
+ #
152
+ # Defaults to nil.
153
+ #
154
+ # If enabled, sets the `html` key in the `Diffbot::Article`.
155
+ property :html
156
+
157
+ # Public: Set to true to keep any inline ads in the generated story.
158
+ #
159
+ # Defaults to nil.
160
+ #
161
+ # If enabled, it will change the `html` key in the `Diffbot::Article`.
162
+ property :dontStripAds, from: :dont_strip_ads
163
+
164
+ # Public: Set to true to generate tags for the extracted story.
165
+ #
166
+ # Defaults to nil.
167
+ #
168
+ # If enabled, sets the `tags` key in the `Diffbot::Article`.
169
+ property :tags
170
+
171
+ # Public: Set to true to find the comments and identify count, link, etc.
172
+ #
173
+ # Defaults to nil.
174
+ #
175
+ # If enabled, sets the `comments` key in the `Diffbot::Article`.
176
+ property :comments
177
+
178
+ # Public: Set to true to return a summary text.
179
+ #
180
+ # Defaults to nil.
181
+ #
182
+ # If enabled, sets the `summary` key in the `Diffbot::Article`.
183
+ property :summary
184
+
185
+ # Public: Set to true to include performance and probabilistic scoring
186
+ # stats.
187
+ #
188
+ # Defaults to nil.
189
+ #
190
+ # If enabled, sets the `stats` key in the `Diffbot::Article`.
191
+ property :stats
192
+ end
193
+ end
194
+ end
@@ -0,0 +1,113 @@
1
+ module Diffbot
2
+ # Public: Extend a hash with this mixin to make keys coercible to certain
3
+ # classes. These keys, when assigned to the hash, will be transformed into the
4
+ # specified classes.
5
+ #
6
+ # The object you pass as coercion types should implement either a `coerce` or
7
+ # a `new` method.
8
+ #
9
+ # You can define rules to coerce properties into classes or collections of
10
+ # classes. In the latter case, CoercibleHash will just map over whatever value
11
+ # is passed and attempt to coerce each item individually to the given class.
12
+ #
13
+ # Examples
14
+ #
15
+ # class Address < Struct.new(:street, :zipcode, :state)
16
+ # def self.coerce(address)
17
+ # new(address[:street], address[:zipcode], address[:state])
18
+ # end
19
+ # end
20
+ #
21
+ # class Person < Hash
22
+ # extend Diffbot::CoercibleHash
23
+ #
24
+ # coerce_property :address, Address
25
+ # coerce_property :children, collection: Person
26
+ #
27
+ # def name
28
+ # self["name"]
29
+ # end
30
+ # end
31
+ #
32
+ # person = Person.new(address: {
33
+ # street: "123 Example St.", zipcode: "12345", state: "XX"
34
+ # })
35
+ #
36
+ # person.address.street #=> "123 Example St."
37
+ # # etc.
38
+ #
39
+ # father = Person.new(name: "John", children: [
40
+ # { name: "Tim" }, { name: "Sarah" }
41
+ # ])
42
+ #
43
+ # father.name #=> "John"
44
+ # father.children.first.name #=> "Tim"
45
+ # father.children.last.name #=> "Sarah"
46
+ module CoercibleHash
47
+ # The coercion rules defined for this hash.
48
+ attr_reader :coercions
49
+
50
+ # Adds a #[]= that checks for coercion on the property and delegates to super.
51
+ def self.extended(base)
52
+ base.instance_variable_set("@coercions", {})
53
+ base.class_eval do
54
+ def []=(property, value)
55
+ if self.class.coercions.key?(property.to_s)
56
+ super property, self.class.coercions[property.to_s].(value)
57
+ else
58
+ super
59
+ end
60
+ end
61
+ end
62
+ end
63
+
64
+ # Public: Coerce a property of this hash into a given type. We will try to
65
+ # call .coerce on the object you pass as the class, and if that fails, we will
66
+ # call .new.
67
+ #
68
+ # property - The name of the property to coerce.
69
+ # class_or_options - Either a class to which coerce, or a hash with options:
70
+ # * class: The class to which coerce
71
+ # * collection: Coerce the key into an array of members of
72
+ # this class.
73
+ #
74
+ # Examples
75
+ #
76
+ # class Person < Hash
77
+ # extend Diffbot::CoercibleHash
78
+ #
79
+ # coerce_property :address, Address
80
+ #
81
+ # coerce_property :children, collection: Person
82
+ #
83
+ # coerce_property :dob, class: Date
84
+ # end
85
+ def coerce_property(property, options)
86
+ unless options.is_a?(Hash)
87
+ options = { class: options }
88
+ end
89
+
90
+ coercion_method = ->(obj) do
91
+ if obj.respond_to?(:coerce)
92
+ obj.method(:coerce)
93
+ elsif obj.respond_to?(:new)
94
+ obj.method(:new)
95
+ else
96
+ raise ArgumentError, "#{obj.inspect} does not implement neither .coerce nor .new"
97
+ end
98
+ end
99
+
100
+ if options.has_key?(:collection)
101
+ klass = options[:collection]
102
+ coercion = ->(value) { value.map { |el| coercion_method[klass][el] } }
103
+ elsif options.has_key?(:class)
104
+ klass = options[:class]
105
+ coercion = ->(value) { coercion_method[klass][value] }
106
+ else
107
+ raise ArgumentError, "You need to specify either :class or :collection"
108
+ end
109
+
110
+ coercions[property.to_s] = coercion
111
+ end
112
+ end
113
+ end
@@ -0,0 +1,60 @@
1
+ require "nokogiri"
2
+ require "diffbot"
3
+ require "diffbot/item"
4
+
5
+ module Diffbot
6
+ # Representation of an front page. This class offers a single entry point: the
7
+ # `.fetch` method, that, given a URL, will return the front page as analyzed
8
+ # by Diffbot.
9
+ class Frontpage < Hashie::Trash
10
+ extend CoercibleHash
11
+
12
+ # Public: Fetch a frontpage's information from a URL.
13
+ #
14
+ # url - The frontpage URL.
15
+ # token - The API token for Diffbot.
16
+ # parser - The callable object that will parse the raw output from the
17
+ # API. Defaults to Diffbot::Frontpage::DmlParser.method(:parse).
18
+ #
19
+ # Examples
20
+ #
21
+ # # Request a frontpage with the default options.
22
+ # frontpage = Diffbot::Frontpage.fetch(url, api_token)
23
+ #
24
+ # Returns a Diffbot::Frontpage.
25
+ def self.fetch(url, token=Diffbot.token, parser=Diffbot::Frontpage::DmlParser.method(:parse))
26
+ request = Diffbot::Request.new(token)
27
+ response = request.perform(:get, endpoint) do |req|
28
+ req[:query][:url] = url
29
+ end
30
+
31
+ new(parser.call(response.body))
32
+ end
33
+
34
+ # The API endpoint where requests should be made.
35
+ #
36
+ # Returns a URL.
37
+ def self.endpoint
38
+ "http://www.diffbot.com/api/frontpage"
39
+ end
40
+
41
+ # Public: The title of the page.
42
+ property :title
43
+
44
+ # Public: The favicon of the page.
45
+ property :icon
46
+
47
+ # Public: The favicon of the page.
48
+ property :source_type, from: :sourceType
49
+
50
+ # Public: The URL where this page was extracted from.
51
+ property :source_url, from: :sourceURL
52
+
53
+ # Public: The items extracted from the page. These are instances of
54
+ # Diffbot::Item.
55
+ property :items
56
+ coerce_property :items, collection: Item
57
+ end
58
+ end
59
+
60
+ require "diffbot/frontpage/dml_parser"
@@ -0,0 +1,83 @@
1
+ # Parser that takes the XML generated from Diffbot's Frontpage API call and
2
+ # returns a hash suitable for Diffbot::Frontpage.
3
+ class Diffbot::Frontpage::DmlParser
4
+ # Take the string of DML and convert it into a nice little hash we can pass to
5
+ # Diffbot::Frontpage.
6
+ #
7
+ # dml - A string of DML.
8
+ #
9
+ # Returns a Hash.
10
+ def self.parse(dml)
11
+ node = Nokogiri(dml).root
12
+ parser = new(node)
13
+ parser.parse
14
+ end
15
+
16
+ # Initialize the parser with a DML node.
17
+ #
18
+ # dml - The root XML::Element
19
+ def initialize(node)
20
+ @dml = node
21
+ end
22
+
23
+ # The root element of the DML document.
24
+ attr_reader :dml
25
+
26
+ # Parses the Diffbot Markup Language and generates a Hash that we can pass to
27
+ # Frontpage.new.
28
+ #
29
+ # Returns a Hash.
30
+ def parse
31
+ attrs = {}
32
+
33
+ info = dml % "info"
34
+ attrs["title"] = (info % "title").text
35
+ attrs["icon"] = (info % "icon").text
36
+ attrs["sourceType"] = (info % "sourceType").text
37
+ attrs["sourceURL"] = (info % "sourceURL").text
38
+
39
+ items = dml / "item"
40
+ attrs["items"] = items.map do |item|
41
+ ItemParser.new(item).parse
42
+ end
43
+
44
+ attrs
45
+ end
46
+
47
+ # Parser that takes the XML from a particular item from the XML returned from
48
+ # the frontpage API.
49
+ class ItemParser
50
+ # The root element of each item.
51
+ attr_reader :item
52
+
53
+ # Initialize the parser with an Item node.
54
+ #
55
+ # item_node - The root node of the item.
56
+ def initialize(item_node)
57
+ @item = item_node
58
+ end
59
+
60
+ # Parses the item's DML and generates a Hash that we can add to the DML
61
+ # parser's parser's "items" key together with the other items.
62
+ #
63
+ # Returns a Hash.
64
+ def parse
65
+ attrs = {}
66
+
67
+ %w(title link pubDate description textSummary).each do |attr|
68
+ node = item % attr
69
+ attrs[attr] = node && node.text
70
+ end
71
+
72
+ %w(type img id xroot cluster).each do |attr|
73
+ attrs[attr] = item[attr]
74
+ end
75
+
76
+ attrs["stats"] = %w(fresh sp sr).each_with_object({}) do |attr, hash|
77
+ hash[attr] = item[attr].to_f
78
+ end
79
+
80
+ attrs
81
+ end
82
+ end
83
+ end
@@ -0,0 +1,55 @@
1
+ module Diffbot
2
+ class Item < Hashie::Trash
3
+ extend CoercibleHash
4
+
5
+ class Stats < Hashie::Trash
6
+ property :fresh
7
+ property :static_rank, from: :sr
8
+ property :spam_score, from: :sp
9
+ end
10
+
11
+ # Public: The identifier of this item.
12
+ property :id
13
+
14
+ # Public: The title of this item.
15
+ property :title
16
+
17
+ # Public: The permalink/URL for this item.
18
+ property :link
19
+
20
+ # Public: A string with the date of the item.
21
+ property :pub_date, from: :pubDate
22
+
23
+ # Public: The HTML from the item.
24
+ property :description
25
+
26
+ # Public: A summary line with text from the item.
27
+ property :summary, from: :textSummary
28
+
29
+ # Public: The type of the item. Can be either `IMAGE`, `LINK`, `STORY`, or
30
+ # `CHUNK` (a chunk of HTML).
31
+ property :type
32
+
33
+ # Public: The URL for the image of this item.
34
+ property :img
35
+
36
+ # Public: The XPath where this item is located at.
37
+ property :xroot
38
+
39
+ # Public: The XPath for the cluster of items where this item comes from. If
40
+ # a frontpage has, for example, a main list of articles and a sidebar with
41
+ # "Top Articles", for example, both will be separate clusters, each with
42
+ # their own articles.
43
+ property :cluster
44
+
45
+ # Public: Stats extracted from this item. This is an object with the
46
+ # following attributes:
47
+ #
48
+ # fresh - The percentage of the item that has changed compared to the
49
+ # previous crawl.
50
+ # static_rank - The quality score of the item on a 1 to 5 scale.
51
+ # spam_score - The probability this item is spam/an advertisement.
52
+ property :stats
53
+ coerce_property :stats, class: Stats
54
+ end
55
+ end
@@ -0,0 +1,54 @@
1
+ require "excon"
2
+
3
+ module Diffbot
4
+ class Request
5
+ # The API token for Diffbot.
6
+ attr_reader :token
7
+
8
+ # Public: Initialize a new request to the API.
9
+ #
10
+ # token - The API token for Diffbot.
11
+ def initialize(token)
12
+ @token = token
13
+ end
14
+
15
+ # Public: Perform an HTTP request against Diffbot's API.
16
+ #
17
+ # method - The request method, one of :get, :head, :post, :put, or
18
+ # :delete.
19
+ # endpoint - The URL to which we'll make the request, as a String.
20
+ # query - A hash of query string params we want to pass along.
21
+ #
22
+ # Yields the request hash before making the request.
23
+ #
24
+ # Returns the response.
25
+ def perform(method, endpoint, query={})
26
+ request_options = build_request(method, query)
27
+ yield request_options if block_given?
28
+
29
+ request = Excon.new(endpoint)
30
+
31
+ request.request(request_options)
32
+ end
33
+
34
+ # Build the hash of options that Excon requires for an HTTP request.
35
+ #
36
+ # method - A Symbol with the HTTP method (:get, :post, etc).
37
+ # query_params - Any query parameters to add to the request.
38
+ #
39
+ # Returns a Hash.
40
+ def build_request(method, query_params={})
41
+ query = { token: token }.merge(query_params)
42
+ request = { query: query, method: method, headers: {} }
43
+
44
+ if Diffbot.instrumentor
45
+ request.update(
46
+ instrumentor: Diffbot.instrumentor,
47
+ instrumentor_name: "diffbot"
48
+ )
49
+ end
50
+
51
+ request
52
+ end
53
+ end
54
+ end
@@ -0,0 +1,66 @@
1
+ require "test_helper"
2
+ require "diffbot/coercible_hash"
3
+
4
+ describe Diffbot::CoercibleHash do
5
+ module Foo
6
+ def self.coerce(value)
7
+ "coerced #{value}"
8
+ end
9
+ end
10
+
11
+ module Bar
12
+ def self.new(value)
13
+ "initialized #{value}"
14
+ end
15
+ end
16
+
17
+ module Baz
18
+ def self.coerce(value)
19
+ "coerced #{value}"
20
+ end
21
+
22
+ def self.new(value)
23
+ "initialized #{value}"
24
+ end
25
+ end
26
+
27
+ class TestHash < Hash
28
+ extend Diffbot::CoercibleHash
29
+
30
+ coerce_property :foo, Foo
31
+ coerce_property :foos, collection: Foo
32
+
33
+ coerce_property :bar, Bar
34
+
35
+ coerce_property :baz, Baz
36
+ end
37
+
38
+ subject do
39
+ TestHash.new
40
+ end
41
+
42
+ it "coerces keys using the .coerce method" do
43
+ subject["foo"] = 1
44
+ subject["foo"].must_equal("coerced 1")
45
+ end
46
+
47
+ it "coerces collections" do
48
+ subject["foos"] = [1, 2, 3]
49
+ subject["foos"].must_equal(["coerced 1", "coerced 2", "coerced 3"])
50
+ end
51
+
52
+ it "coerces keys using the .new method" do
53
+ subject["bar"] = 2
54
+ subject["bar"].must_equal("initialized 2")
55
+ end
56
+
57
+ it "when both are present, prefers .coerce" do
58
+ subject["baz"] = 3
59
+ subject["baz"].must_equal("coerced 3")
60
+ end
61
+
62
+ it "coerces symbols as well" do
63
+ subject[:foo] = 2
64
+ subject[:foo].must_equal("coerced 2")
65
+ end
66
+ end
@@ -0,0 +1,2 @@
1
+ require "minitest/spec"
2
+ require "minitest/autorun"
metadata ADDED
@@ -0,0 +1,114 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: diffbot
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Nicolas Sanguinetti
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-02-06 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: excon
16
+ requirement: &70280593864880 !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: *70280593864880
25
+ - !ruby/object:Gem::Dependency
26
+ name: yajl-ruby
27
+ requirement: &70280593864420 !ruby/object:Gem::Requirement
28
+ none: false
29
+ requirements:
30
+ - - ! '>='
31
+ - !ruby/object:Gem::Version
32
+ version: '0'
33
+ type: :runtime
34
+ prerelease: false
35
+ version_requirements: *70280593864420
36
+ - !ruby/object:Gem::Dependency
37
+ name: nokogiri
38
+ requirement: &70280593864000 !ruby/object:Gem::Requirement
39
+ none: false
40
+ requirements:
41
+ - - ! '>='
42
+ - !ruby/object:Gem::Version
43
+ version: '0'
44
+ type: :runtime
45
+ prerelease: false
46
+ version_requirements: *70280593864000
47
+ - !ruby/object:Gem::Dependency
48
+ name: hashie
49
+ requirement: &70280593863580 !ruby/object:Gem::Requirement
50
+ none: false
51
+ requirements:
52
+ - - ! '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ type: :runtime
56
+ prerelease: false
57
+ version_requirements: *70280593863580
58
+ - !ruby/object:Gem::Dependency
59
+ name: minitest
60
+ requirement: &70280593863160 !ruby/object:Gem::Requirement
61
+ none: false
62
+ requirements:
63
+ - - ! '>='
64
+ - !ruby/object:Gem::Version
65
+ version: '0'
66
+ type: :development
67
+ prerelease: false
68
+ version_requirements: *70280593863160
69
+ description: Diffbot provides a concise API for analyzing and extracting semantic
70
+ information from web pages using Diffbot (http://www.diffbot.com).
71
+ email: hi@nicolassanguinetti.info
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - .gitignore
77
+ - LICENSE
78
+ - README.md
79
+ - Rakefile
80
+ - diffbot.gemspec
81
+ - lib/diffbot.rb
82
+ - lib/diffbot/article.rb
83
+ - lib/diffbot/coercible_hash.rb
84
+ - lib/diffbot/frontpage.rb
85
+ - lib/diffbot/frontpage/dml_parser.rb
86
+ - lib/diffbot/item.rb
87
+ - lib/diffbot/request.rb
88
+ - test/coercible_hash_test.rb
89
+ - test/test_helper.rb
90
+ homepage: http://github.com/tinder/diffbot
91
+ licenses: []
92
+ post_install_message:
93
+ rdoc_options: []
94
+ require_paths:
95
+ - lib
96
+ required_ruby_version: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ! '>='
100
+ - !ruby/object:Gem::Version
101
+ version: '0'
102
+ required_rubygems_version: !ruby/object:Gem::Requirement
103
+ none: false
104
+ requirements:
105
+ - - ! '>='
106
+ - !ruby/object:Gem::Version
107
+ version: '0'
108
+ requirements: []
109
+ rubyforge_project:
110
+ rubygems_version: 1.8.11
111
+ signing_key:
112
+ specification_version: 3
113
+ summary: Ruby interface to the Diffbot API
114
+ test_files: []