wikipedia_wrapper 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: c55aa7e042f26d61832be539119c30a64667b8af
4
+ data.tar.gz: f1b62295f0e88b7ff14f329a3928da3b454e2142
5
+ SHA512:
6
+ metadata.gz: 24d61ddd0a2191c023e773b81711ab6b189ca75a76a458f690383b39bd78fc4c0c72210a0a820ac04b03a8f2e58c0b63841dcdb78282c482e11829f776126a79
7
+ data.tar.gz: 5fe8f6a7995043e90d7e20a70d0a9b35cac96cb03ddf45202d87f0b8b210a32426093dd4a4b5c2a4b6f29b83a594d82b16fcb54030c065e7584c25819d9d5001
File without changes
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Sybil Ehrensberger
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,101 @@
1
+ # WikipediaWrapper
2
+ WikipediaWrapper is a ruby gem that extracts information from Wikipedia and makes the information available in an easy-to-use API.
3
+
4
+ All information is extracted using the [Wikipedia/MediaWiki API](https://en.wikipedia.org/w/api.php).
5
+
6
+ This gem is inspired by the [Python wrapper for the Wikipedia API](https://github.com/goldsmith/Wikipedia) by [Jonathan Goldsmith](https://github.com/goldsmith).
7
+
8
+ ## Installation
9
+ Install WikipediaWrapper like any other Ruby gem:
10
+
11
+ ```
12
+ gem install wikipedia_wrapper
13
+ ```
14
+
15
+ Or, if you're using Rails/Bundler, add this to your Gemfile:
16
+
17
+ ```
18
+ gem 'wikipedia_wrapper'
19
+ ```
20
+
21
+ and run at the command prompt:
22
+
23
+ ```
24
+ bundle install
25
+ ```
26
+
27
+ ## Usage
28
+ Before you can use the gem, you need to require it in your file:
29
+
30
+ ```ruby
31
+ require 'wikipedia_wrapper'
32
+ ```
33
+
34
+ The following are just some example usages. For detailed information, check the documentation.
35
+
36
+ ### Configuration
37
+ To use this gem you do not need to specify configuration options, but you can. The values specified in the example below are the default values. It is however highly recommended that you choose a caching client and set that before making a lot of requests. The caching functionality is provided by the [cache gem](https://github.com/seamusabshere/cache). See it's documentation for more information on possible caching clients.
38
+
39
+ ```ruby
40
+ WikipediaWrapper.configure do |config|
41
+ config.lang = 'en'
42
+ config.user_agent = 'WikipediaWrapper/0.0.1 (http://sykaeh.github.com/wikipedia_wrapper/) Ruby/2.2.1'
43
+ config.default_ttl = 604800
44
+ # additional config options are: image_restrictions, img_height, img_width
45
+ end
46
+
47
+ WikipediaWrapper.cache(Memcached.new('127.0.0.1:11211', :binary_protocol => true)) # or
48
+ WikipediaWrapper.cache(Dalli::Client.new) # or
49
+ WikipediaWrapper.cache(Redis.new) # or
50
+ WikipediaWrapper.cache(Rails.cache)
51
+ ```
52
+
53
+ ### Autocomplete suggestions
54
+ Autocomplete suggestions are the suggestions you get when you start typing a word in the search box on the page. You get the complete title for each page and a short summary. You can specify a limit with the keyword argument `limit` (default is 10).
55
+
56
+ ```ruby
57
+ WikipediaWrapper.autocomplete('api', limit: 3)
58
+ # returns => { "Application programming interface"=>"In computer programming, an application programming interface (API) is a set of routines, protocols, and tools for building software applications.",
59
+ # "Apia"=>"Apia is the capital and the largest city of Samoa. From 1900 to 1919, it was the capital of the German Samoa.",
60
+ # "Apink"=>"Apink (Korean: 에이핑크, Japanese: エーピンク; also written A Pink) is a South Korean girl group formed by A Cube Entertainment in 2011. The group consists of Park Cho-rong, Yoon Bo-mi, Jung Eun-ji, Son Na-eun, Kim Nam-joo and Oh Ha-young."}
61
+ ```
62
+
63
+ ### Search
64
+ Search does a full-text search of the term you provided, and returns the list of page titles that match and short snippets containing the search term.
65
+
66
+ ```ruby
67
+ WikipediaWrapper.search('api', limit: 3)
68
+ # returns: => {"Application programming interface"=>""API In computer programming",
69
+ # "Web API"=>"For the Microsoft ASP.NET Web API) for both the web server and web",
70
+ # "Google Gadgets API"=>"Google Gadgets API which allows developers to create Google Gadgets easily. Gadgets are mini-applications built in HTML, JavaScript,"}
71
+ ```
72
+
73
+ ### Summary
74
+ Summary is a convenience function to get the first couple of character, sentences of all of the text before the first section (introductory text) either in plain text or as basic HTML markup.
75
+
76
+ ```ruby
77
+ WikipediaWrapper.summary('Ruby') # returns the intro text in plaintext
78
+ WikipediaWrapper.summary('Ruby', chars: 100) # returns the first 100 characters in plaintext
79
+ WikipediaWrapper.summary('Ruby', sentences: 10) # returns the first 10 sentences in plaintext
80
+ WikipediaWrapper.summary('Ruby', html: true) # returns the intro text in HTML
81
+ ```
82
+
83
+ ### Complete page
84
+ To get all of the information on a particular page, use the `page` convenience wrapper or use the `Page` class directly.
85
+
86
+ ```ruby
87
+
88
+ page = WikipediaWrapper.page('Ruby')
89
+ # or directly:
90
+ page = WikipediaWrapper::Page.new('Ruby')
91
+
92
+ page.title # => "Ruby"
93
+ page.url # => "http://en.wikipedia.org/wiki/Ruby"
94
+ page.extract # => "<p>A <b>ruby</b> is a pink to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). The red color is caused mainly by the presence of the element chromium. Its name comes from <i>ruber</i>, Latin for red. Other varieties of gem-quality corundum are called sapphires. Ruby is considered one of the four precious stones, together with sapphire, emerald and diamond.</p>\n<p>Prices of rubies are primarily determined by color. The brightest and most valuable \"red\" called blood-red, commands a large premium over other rubies of similar quality. After color follows clarity: similar to diamonds, a clear stone will command a premium, but a ruby without any needle-like rutile inclusions may indicate that the stone has been treated. Cut and carat (weight) are also an important factor in determining the price. Ruby is the traditional birthstone for July and is always lighter red or pink than garnet.</p>\n<p></p>"
95
+ ```
96
+
97
+ ## Tests
98
+ WikipediaWrapper comes with a test suite (just run rake test).
99
+
100
+ ## License
101
+ Copyright (c) 2015 Sybil Ehrensberger, released under the MIT license
@@ -0,0 +1,216 @@
1
+ require 'cache'
2
+ require 'wikipedia_wrapper/util'
3
+ require 'wikipedia_wrapper/exception'
4
+ require 'wikipedia_wrapper/configuration'
5
+ require 'wikipedia_wrapper/page'
6
+
7
+ # @author Sybil Ehrensberger <contact@sybil-ehrensberger.com>
8
+ module WikipediaWrapper
9
+
10
+ extend self
11
+
12
+ # @!attribute [r]
13
+ # @return [WikipediaWrapper::Configuration] the configuration for this module
14
+ def config
15
+ @config ||= Configuration.new
16
+ end
17
+
18
+ # Set up configuration options
19
+ #
20
+ # @yieldparam config [WikipediaWrapper::Configuration] the configuration instance
21
+ # @example
22
+ # WikipediaWrapper.configure do |config|
23
+ # config.api_key = 'http://en.wikipedia.org/w/api.php'
24
+ # config.user_agent = 'WikipediaWrapper/0.0.1 (http://sykaeh.github.com/wikipedia_wrapper/) Ruby/2.2.1'
25
+ # config.default_ttl = 604800
26
+ # end
27
+ def configure
28
+ @config ||= Configuration.new
29
+ yield(config)
30
+ end
31
+
32
+ # Retrieve the cache for this module if it is already defined, otherwise
33
+ # create a new Cache, defaulting to an in-memory cache
34
+ # @return [Cache] the cache
35
+ def cache
36
+ if @cache.nil?
37
+ @cache = Cache.new
38
+ @cache.config.default_ttl = config.default_ttl
39
+ end
40
+ @cache
41
+ end
42
+
43
+
44
+ # Define the caching client
45
+ # @example
46
+ # WikipediaWrapper.cache(Memcached.new('127.0.0.1:11211', :binary_protocol => true))
47
+ # WikipediaWrapper.cache(Dalli::Client.new)
48
+ # WikipediaWrapper.cache(Redis.new)
49
+ # WikipediaWrapper.cache(Rails.new)
50
+ # @see https://github.com/seamusabshere/cache Cache Gem Configuration
51
+ # @param raw_client [Memcached, Dalli::Client, Redis, memcache-client] a caching client (Memcached, Dalli, memcache-client, redis)
52
+ # @param timeout [Integer] default timeout for the cache entries [in seconds]
53
+ def cache=(raw_client, timeout: config.default_ttl)
54
+ @cache = Cache.wrap(raw_client)
55
+ @cache.config.default_ttl = timeout
56
+ end
57
+
58
+
59
+ # Convenience function to retrieve a Wikipedia page
60
+ #
61
+ # @param term [String] the title of the page
62
+ # @param auto_suggest [Boolean] whether the search and autocorrect suggestion should be used to find a valid term (default: true)
63
+ # @param redirect [Boolean] whether redirects should be followed automatically (default: true)
64
+ # @return [WikipediaWrapper::Page] the wiki page
65
+ def page(term, auto_suggest: true, redirect: true)
66
+
67
+ if auto_suggest
68
+ term = check_page(term)
69
+ end
70
+
71
+ return WikipediaWrapper::Page.new(term, redirect: redirect)
72
+
73
+ end
74
+
75
+ # Plain text or basic HTML summary of the page. Redirects are always followed
76
+ # automatically.
77
+ #
78
+ # @note This is a convenience wrapper - auto_suggest and redirect are enabled by default
79
+ #
80
+ # @raise [WikipediaWrapper::PageError] if no page with that term was found
81
+ # @raise [WikipediaWrapper::MultiplePagesError] if more than one page with that term was found
82
+ # @param term [String] the title of the page
83
+ # @param html [Boolean] if true, return basic HTML instead of plain text
84
+ # @param sentences [Integer] if set, return the first `sentences` sentences (can be no greater than 10).
85
+ # @param chars [Integer] if set, return only the first `chars` characters (actual text returned may be slightly longer).
86
+ # @return [String] the plain text or basic HTML summary of that page
87
+ def summary(term, html: false, sentences: 0, chars: 0)
88
+
89
+ # get auto_suggest
90
+ term = check_page(term)
91
+
92
+ query_params = {
93
+ 'redirects': '',
94
+ 'prop': 'extracts|pageprops',
95
+ 'titles': term,
96
+ 'ppprop': 'disambiguation',
97
+ }
98
+
99
+ if !html
100
+ query_params['explaintext'] = ''
101
+ end
102
+
103
+ if sentences
104
+ query_params[:exsentences] = (sentences > 10 ? 10 : sentences).to_s
105
+ elsif chars
106
+ query_params[:exchars] = chars.to_s
107
+ else
108
+ query_params[:exintro] = ''
109
+ end
110
+
111
+ raw_results = fetch(query_params)
112
+ check_results(term, raw_results)
113
+
114
+ id, info = raw_results['query']['pages'].first
115
+ summary = info['extract']
116
+
117
+ return summary
118
+ end
119
+
120
+
121
+ # Do a Wikipedia search for the given term
122
+ #
123
+ # @param limit [Integer] the maximimum number of results returned
124
+ # @param suggestion [Boolean] set to true if you want an autocorrect suggestion
125
+ # @return [{String => String}] if suggestion is false, return a Hash of the suggestions
126
+ # (as keys) and a snippet of the search result as values
127
+ # @return [Array<{String => String}, <String, nil>>] if suggestion is true,
128
+ # return return a Hash of the suggestions
129
+ # (as keys) and a snippet of the search result as values in the first position of
130
+ # the array and in the second position a proposed suggestion or nil if there
131
+ # was no suggestion
132
+ def search(term, limit: 10, suggestion: false)
133
+
134
+ search_params = {
135
+ 'list': 'search',
136
+ 'srprop': 'snippet',
137
+ 'srlimit': limit.to_s,
138
+ 'srsearch': term
139
+ }
140
+
141
+ raw_results = fetch(search_params)
142
+
143
+ results = {}
144
+
145
+ raw_results['query']['search'].each do |sr|
146
+ results[sr['title']] = sr['snippet'].gsub(/<span .*>(?<term>[^<]*)<\/span>/, '\k<term>')
147
+ end
148
+
149
+ if suggestion
150
+ s = raw_results['query']['searchinfo'].key?('suggestion') ? raw_results['query']['searchinfo']['suggestion'] : nil
151
+ return [results, s]
152
+ else
153
+ return results
154
+ end
155
+
156
+ end
157
+
158
+ # Get an autocomplete suggestions for the given term. The term will be used
159
+ # as a prefix.
160
+
161
+ # @see https://www.mediawiki.org/wiki/API:Opensearch MediaWiki API
162
+ # @raise [WikipediaWrapper::FormatError] if an unknown format is encountered in the response
163
+ # @param term [String] the term to get the autocompletions for (used as a prefix)
164
+ # @param limit [Integer] the maximum number of results to return (may not exceed 100)
165
+ # @param redirect [Boolean] whether redirects should be followed for suggestions
166
+ # @return [Hash{String=>String}] a hash where the keys are the titles of the articles and
167
+ # the values are a short description of the page
168
+ def autocomplete(term, limit: 10, redirect: true)
169
+
170
+ query_params = {
171
+ 'action': 'opensearch',
172
+ 'search': term,
173
+ 'redirects': redirect ? 'resolve' : 'return',
174
+ 'limit': (limit > 100 ? 100 : limit).to_s
175
+ }
176
+
177
+ raw_results = fetch(query_params)
178
+
179
+ if raw_results.length != 4
180
+ raise WikipediaWrapper::FormatError.new("autocomplete", "array had length of #{raw_results.length} instead of 4")
181
+ end
182
+
183
+ num_suggestions = raw_results[1].length - 1
184
+
185
+ results = {}
186
+ for i in 0..num_suggestions
187
+ results[raw_results[1][i]] = raw_results[2][i]
188
+ end
189
+
190
+ return results
191
+
192
+ end
193
+
194
+
195
+ # Function to determine whether there is a page with that term. It uses
196
+ # the search and suggestion functionality to find a possible match
197
+ # and raises a PageError if no page could be found.
198
+ #
199
+ # @raise [WikipediaWrapper::PageError] if no page with that term could be found
200
+ # @param term [String] the term for which we want a page
201
+ # @return [String] the actual title of the page
202
+ def check_page(term)
203
+
204
+ results, suggestion = search(term, limit: 1, suggestion: true)
205
+ if !suggestion.nil?
206
+ return suggestion
207
+ elsif results.length == 1
208
+ title, snippet = results.first
209
+ return title
210
+ else
211
+ raise WikipediaWrapper::PageError.new(term)
212
+ end
213
+
214
+ end
215
+
216
+ end
@@ -0,0 +1,104 @@
1
+ require 'yaml'
2
+
3
+ module WikipediaWrapper
4
+
5
+ class Configuration
6
+ attr_accessor :lang, :user_agent, :default_ttl, :img_width, :img_height
7
+ attr_reader :api_url
8
+
9
+ # Initialize the configuration with some sensible defaults
10
+ def initialize
11
+ set_defaults
12
+ end
13
+
14
+ # Reset the configuration to the initial state with the default parameters
15
+ def reset
16
+ set_defaults
17
+ @image_restrictions = nil
18
+ end
19
+
20
+ def lang=(lang_code)
21
+ @lang = lang_code
22
+ @api_url = "https://#{@lang}.wikipedia.org/w/api.php"
23
+ end
24
+
25
+ def image_restrictions
26
+ if @image_restrictions.nil?
27
+ self.image_restrictions = File.expand_path(File.dirname(__FILE__)) + '/default_image_restrictions.yaml'
28
+ end
29
+ @image_restrictions
30
+ end
31
+
32
+ def image_restrictions=(path)
33
+ begin
34
+ @image_restrictions = YAML.load_file(path)
35
+ rescue Errno::ENOENT # No such file
36
+ raise WikipediaWrapper::ConfigurationError.new("The file #{path} does not exist")
37
+ rescue Psych::SyntaxError => e
38
+ raise WikipediaWrapper::ConfigurationError.new("SyntaxError in the file #{path}: #{e}")
39
+ end
40
+ end
41
+
42
+ def image_allowed?(filename)
43
+ allowed_ending?(filename) && !blocked_filename?(filename) && !blocked_partial?(filename)
44
+ end
45
+
46
+ private
47
+
48
+ def set_defaults
49
+ self.lang = 'en'
50
+ @user_agent = 'WikipediaWrapper/0.1 (http://sykaeh.github.com/wikipedia_wrapper/; wikipedia_wrapper@sybil-ehrensberger.com) Ruby/2.2.1'
51
+ @default_ttl = 7*24*60*60
52
+ @img_height = nil
53
+ @img_width = nil
54
+ end
55
+
56
+ def allowed_ending?(filename)
57
+ if self.image_restrictions['allowed_endings'].nil? # if there are no allowed_endings
58
+ if self.image_restrictions['exclude_endings'].nil?
59
+ return true
60
+ end
61
+
62
+ self.image_restrictions['exclude_endings'].each do |e|
63
+ if filename.downcase.end_with?(e.downcase)
64
+ return false
65
+ end
66
+ end
67
+ return true
68
+ else # if allowed_endings is specified
69
+ self.image_restrictions['allowed_endings'].each do |e|
70
+ if filename.downcase.end_with?(e.downcase)
71
+ return true
72
+ end
73
+ end
74
+ return false
75
+ end
76
+ end
77
+
78
+ def blocked_partial?(filename)
79
+
80
+ if self.image_restrictions['exclude_partials'].nil?
81
+ return false
82
+ end
83
+
84
+ self.image_restrictions['exclude_partials'].each do |p|
85
+ if filename.downcase.include? p.downcase
86
+ return true
87
+ end
88
+ end
89
+
90
+ return false
91
+ end
92
+
93
+ def blocked_filename?(filename)
94
+
95
+ if self.image_restrictions['exclude_files'].nil?
96
+ false
97
+ else
98
+ self.image_restrictions['exclude_files'].include? filename
99
+ end
100
+ end
101
+
102
+ end
103
+
104
+ end
@@ -0,0 +1,27 @@
1
+ ---
2
+ allowed_endings: # leave blank to allow all endings
3
+ - .png
4
+ - .jpeg
5
+ - .jpg
6
+
7
+ exclude_endings: # is only considered if allowed_endings is blank
8
+ - .ogg
9
+
10
+ exclude_files:
11
+ - File:Commons-logo.svg
12
+ - File:Wiktionary-logo-en.svg
13
+ - File:Edit-clear.svg
14
+ - File:Disambig gray.svg
15
+
16
+ exclude_partials:
17
+ - arms
18
+ - blason
19
+ - icon
20
+ - wappen
21
+ - projection
22
+ - bandera
23
+ - flag
24
+ - map
25
+ - arms
26
+ - karte
27
+ - coa
@@ -0,0 +1,142 @@
1
+
2
+ module WikipediaWrapper
3
+
4
+ # Base Wikipedia error class
5
+ class WikipediaError < StandardError
6
+
7
+ def initialize(error)
8
+ @error = error
9
+ end
10
+
11
+ def message
12
+ "An unexpected error occured: \"#{@error}\". Please report it on GitHub (github.com/sykaeh/wikipedia_wrapper)!"
13
+ end
14
+
15
+ end
16
+
17
+ # Exception raised when there is a problem with the configuration
18
+ class ConfigurationError < WikipediaError
19
+
20
+ def initialize(msg)
21
+ @msg = msg
22
+ end
23
+
24
+ def message
25
+ "#{@msg}"
26
+ end
27
+
28
+ end
29
+
30
+ # Exception raised when the expected return format does not match what was received
31
+ class FormatError < WikipediaError
32
+
33
+ def initialize(function, msg)
34
+ @func = function
35
+ @msg = msg
36
+ end
37
+
38
+ def message
39
+ "Format Error calling \"#{@func}\": #{@msg}"
40
+ end
41
+
42
+ end
43
+
44
+ # Exception raised when no Wikipedia matched a query.
45
+ class PageError < WikipediaError
46
+
47
+ def initialize(term, pageid: false)
48
+ @pageid = pageid
49
+ @term = term
50
+ end
51
+
52
+ def message
53
+ if @pageid
54
+ "Page id \"#{@term}\" does not match any pages. Try another id!"
55
+ else
56
+ "\"#{@term}\" does not match any pages. Try another query!"
57
+ end
58
+ end
59
+
60
+ end
61
+
62
+ # Exception raised when more than one Wikipedia article matched a query.
63
+ class MultiplePagesError < WikipediaError
64
+
65
+ def initialize(page_titles, term, pageid: false)
66
+ @pages = page_titles
67
+ @pageid = pageid
68
+ @term = term
69
+ end
70
+
71
+ def message
72
+ if @pageid
73
+ "Page id \"#{@term}\" matches #{@pages.length} pages: \n#{@pages.join(', ')}"
74
+ else
75
+ "\"#{@term}\" matches #{@pages.length} pages: \n#{@pages.join(', ')}"
76
+ end
77
+ end
78
+
79
+ end
80
+
81
+ # Exception raised when the request is invalid and the server replies with an error
82
+ class InvalidRequestError < WikipediaError
83
+
84
+ def initialize(url, msg)
85
+ @url = url
86
+ @msg = msg
87
+ end
88
+
89
+ def message
90
+ "Invalid Request for URL #{@url}: #{@msg}"
91
+ end
92
+
93
+ end
94
+
95
+
96
+
97
+ # Exception raised when a page resolves to a Disambiguation page.
98
+ #
99
+ # The `options` property contains a list of titles
100
+ # of Wikipedia pages that the query may refer to.
101
+ #
102
+ # @note `options` does not include titles that do not link to a valid Wikipedia page.
103
+ class DisambiguationError < WikipediaError
104
+
105
+ def initialize(title)
106
+ @title = title
107
+ end
108
+
109
+ def message
110
+ "\"#{@title}\" may refer to several different things."
111
+ end
112
+
113
+ end
114
+
115
+
116
+ # Exception raised when a page title unexpectedly resolves to a redirect.
117
+ class RedirectError < WikipediaError
118
+
119
+ def initialize(title)
120
+ @title = title
121
+ end
122
+
123
+ def message
124
+ "\"#{@title}\" resulted in a redirect. Set the redirect property to True to allow automatic redirects."
125
+ end
126
+
127
+ end
128
+
129
+ # Exception raised when a request to the Mediawiki servers times out.
130
+ class HTTPTimeoutError < WikipediaError
131
+
132
+ def initialize(query)
133
+ @query = query
134
+ end
135
+
136
+ def message
137
+ "Searching for \"#{@query}\" resulted in a timeout. Try again in a few seconds, and make sure you have rate limiting set to True."
138
+ end
139
+
140
+ end
141
+
142
+ end
@@ -0,0 +1,69 @@
1
+ require 'wikipedia_wrapper/exception'
2
+
3
+ module WikipediaWrapper
4
+
5
+ class WikiImage
6
+
7
+ attr_accessor :small, :normal, :description_url
8
+
9
+ def initialize(raw_info)
10
+
11
+ @small = nil
12
+ @normal = nil
13
+
14
+ if !raw_info.key?('imageinfo') || raw_info['imageinfo'].length != 1
15
+ raise WikipediaWrapper::FormatError.new('WikiImage initialize', "Unknown format for imageinfo: #{raw_info}")
16
+ end
17
+
18
+ @filename = (raw_info.key? 'title') ? raw_info['title'].sub('File:', '') : 'No name'
19
+
20
+ data = {
21
+ 'name': @filename,
22
+ 'mime': raw_info['imageinfo'][0]['mime'],
23
+ }
24
+
25
+ @description_url = raw_info['imageinfo'][0]['descriptionurl']
26
+
27
+ @normal = Image.new(raw_info['imageinfo'][0]['url'],
28
+ raw_info['imageinfo'][0]['width'].to_i,
29
+ raw_info['imageinfo'][0]['height'].to_i, data)
30
+
31
+
32
+ if raw_info['imageinfo'][0].key? ('thumburl')
33
+ @small = Image.new(raw_info['imageinfo'][0]['thumburl'],
34
+ raw_info['imageinfo'][0]['thumbwidth'].to_i,
35
+ raw_info['imageinfo'][0]['thumbheight'].to_i,
36
+ data)
37
+ else
38
+ @small = @normal
39
+ end
40
+
41
+ end
42
+
43
+ def to_s
44
+ "WikiImage #{@filename}"
45
+ end
46
+
47
+ end
48
+
49
+ class Image
50
+
51
+ attr_reader :name, :url, :width, :height, :mime # makes attributes readable
52
+
53
+ def initialize(url, width, height, data)
54
+
55
+ @url = url
56
+ @name = data[:name] || ''
57
+ @mime = data[:mime] || ''
58
+ @height = data[:height] || 0
59
+ @width = data[:width] || 0
60
+
61
+ end
62
+
63
+ def to_s
64
+ "Image: #{@name} (#{@url})"
65
+ end
66
+
67
+ end
68
+
69
+ end
@@ -0,0 +1,135 @@
1
+ require 'wikipedia_wrapper/exception'
2
+ require 'wikipedia_wrapper/image'
3
+ require 'wikipedia_wrapper/util'
4
+
5
+ module WikipediaWrapper
6
+
7
+ class Page
8
+
9
+ attr_reader :title, :revision_time, :url, :extract
10
+
11
+ def initialize(term, redirect: true)
12
+
13
+ @term = term
14
+ @redirect = redirect
15
+ @images = nil
16
+ @img_width = WikipediaWrapper.config.img_width
17
+ @img_height = WikipediaWrapper.config.img_height
18
+
19
+ # FIXME: Deal with continuation
20
+
21
+ # FIXME: deal with redirect false!
22
+ load_page
23
+
24
+ end
25
+
26
+ def to_s
27
+ "Wikipedia Page: #{@title} (#{@url}), revid: #{@revision_id}, revtime: #{@revision_time}"
28
+ end
29
+
30
+ def categories
31
+ # FIXME: Implement!
32
+ end
33
+
34
+ # Retrieve image info for all given image filenames, except for the images in the whitelist
35
+ # See {http://www.mediawiki.org/wiki/API:Imageinfo}
36
+ #
37
+ # @param width [Integer] optional width of the smaller image (in px)
38
+ # @param height [Integer] optional height of the smaller image (in px)
39
+ # @note Only one of width and height can be used at the same time. If both are defined, only width is used.
40
+ # @return [Array<WikipediaWrapper::WikiImage>] list of images
41
+ def images(width: nil, height: nil)
42
+
43
+ # if we haven't retrieved any images or the width or height have changed, re-fetch
44
+ if @images.nil? || (!width.nil? && @img_width != width) || (!width.nil? && @img_width != width)
45
+
46
+ unless width.nil?
47
+ @img_width = width
48
+ end
49
+
50
+ unless height.nil?
51
+ @img_height = height
52
+ end
53
+
54
+ @images = []
55
+
56
+ # deal with the case that a page has no images
57
+ if @raw['images'].nil?
58
+ return @images
59
+ end
60
+
61
+ filenames = @raw['images'].map {|img_info| img_info['title']}.compact
62
+
63
+ if filenames.empty? # there are no filenames, return an empty array
64
+ return @images
65
+ end
66
+
67
+ # exclude whitelisted filenames
68
+ filenames = filenames.map { |f| WikipediaWrapper.config.image_allowed?(f) ? f : nil }.compact
69
+
70
+ query_parameters = {
71
+ 'titles': filenames.join('|'),
72
+ 'redirects': '',
73
+ 'prop': 'imageinfo',
74
+ 'iiprop': 'url|size|mime',
75
+ }
76
+
77
+ if (!@img_width.nil?)
78
+ query_parameters[:iiurlwidth] = @img_width.to_s
79
+ elsif (!@img_height.nil?)
80
+ query_parameters[:iiurlheight] = @img_height.to_s
81
+ end
82
+
83
+ raw_results = WikipediaWrapper.fetch(query_parameters)
84
+
85
+ # check if the proper format is there
86
+ if raw_results.key?('query') && raw_results['query'].key?('pages')
87
+ raw_results['query']['pages'].each do |k, main_info|
88
+ begin
89
+ wi = WikiImage.new(main_info)
90
+ @images.push(wi)
91
+ rescue WikipediaWrapper::FormatError => e
92
+ puts e.message
93
+ end
94
+ end
95
+ end
96
+
97
+ end
98
+
99
+ return @images
100
+
101
+ end
102
+
103
+
104
+ private
105
+
106
+ def load_page
107
+
108
+ query_parameters = {
109
+ 'prop': 'revisions|info|extracts|images|pageprops',
110
+ 'titles': @term,
111
+ 'redirects': '',
112
+ 'rvprop': 'content',
113
+ 'inprop': 'url',
114
+ 'exintro': '',
115
+ 'ppprop': 'disambiguation',
116
+ }
117
+
118
+ raw_results = WikipediaWrapper.fetch(query_parameters)
119
+ WikipediaWrapper.check_results(@term, raw_results)
120
+
121
+ key, page_info = raw_results['query']['pages'].first
122
+ @raw = page_info
123
+
124
+ @page_id = page_info['pageid']
125
+ @title = page_info['title']
126
+ @revision_time = page_info['touched'] #FIXME: parse in to date & time, format: 2015-04-23T07:20:47Z
127
+ @revision_id = page_info['lastrevid']
128
+ @url = page_info['fullurl']
129
+ @extract = (page_info.key? 'extract') ? page_info['extract'] : ''
130
+
131
+ end
132
+
133
+ end
134
+
135
+ end
@@ -0,0 +1,66 @@
1
+ require 'uri'
2
+ require 'open-uri'
3
+ require 'json'
4
+ require 'wikipedia_wrapper/exception'
5
+
6
+
7
+ module WikipediaWrapper
8
+
9
+
10
+ # Given the request parameters, params, fetch the response from the API URL
11
+ # and parse it as JSON. Raise an InvalidRequestError if an error occurrs.
12
+ #
13
+ # @raise [WikipediaWrapper::InvalidRequestError] if the request was invalid
14
+ # or another error occurrs
15
+ # @param params [Hash{Symbol => String}] hash of the properties that should
16
+ # be added to the request URL
17
+ # @return [Hash] the JSON response of the server converted in to a hash
18
+ def self.fetch(params)
19
+
20
+ # if no action is defined, set it to 'query'
21
+ if !params.key?(:action)
22
+ params[:action] = 'query'
23
+ end
24
+
25
+ params[:format] = 'json' # always return json format
26
+
27
+ # FIXME: deal with continuation
28
+ #params[:continue] = '' # does not work for autocomplete
29
+
30
+ query_part = params.map { |k, v| v.empty? ? "#{k}" : "#{k}=#{v}" }.join("&")
31
+ endpoint_url = URI.encode("#{WikipediaWrapper.config.api_url}?#{query_part}")
32
+
33
+ raw_results = cache.fetch(endpoint_url) {
34
+ f = open(endpoint_url, "User-Agent" => config.user_agent)
35
+ JSON.parse(f.read)
36
+ }
37
+
38
+ if params[:action] != 'opensearch' && raw_results.key?('error')
39
+ raise WikipediaWrapper::InvalidRequestError.new(endpoint_url, raw_results['error']['info'])
40
+ end
41
+
42
+ return raw_results
43
+
44
+ end
45
+
46
+ def self.check_results(term, raw_results)
47
+
48
+ if raw_results['query']['pages'].length > 1
49
+ raise WikipediaWrapper::MultiplePagesError.new(raw_results['query']['pages'].map { |p| p['title'] }, term)
50
+ elsif raw_results['query']['pages'].length < 1
51
+ raise WikipediaWrapper::PageError.new(term)
52
+ end
53
+
54
+ key, page_info = raw_results['query']['pages'].first
55
+ if key == '-1'
56
+ raise WikipediaWrapper::PageError.new(term)
57
+ end
58
+
59
+ # Check for disambiguation pages
60
+ if page_info['pageprops'] && page_info['pageprops']['disambiguation']
61
+ raise WikipediaWrapper::DisambiguationError.new(term)
62
+ end
63
+
64
+ end
65
+
66
+ end
@@ -0,0 +1,3 @@
1
+ module WikipediaWrapper
2
+ VERSION = "0.1.0"
3
+ end
metadata ADDED
@@ -0,0 +1,70 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: wikipedia_wrapper
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Sybil Ehrensberger
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-07-10 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: cache
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '0.4'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '0.4'
27
+ description: WikipediaWrapper is a ruby gem that extracts information from Wikipedia
28
+ and makes the information available in an easy-to-use API. All information is extracted
29
+ using the Wikipedia/MediaWiki API.
30
+ email: contact@sybil-ehrensberger.com
31
+ executables: []
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - CHANGELOG.md
36
+ - LICENSE
37
+ - README.md
38
+ - lib/wikipedia_wrapper.rb
39
+ - lib/wikipedia_wrapper/configuration.rb
40
+ - lib/wikipedia_wrapper/default_image_restrictions.yaml
41
+ - lib/wikipedia_wrapper/exception.rb
42
+ - lib/wikipedia_wrapper/image.rb
43
+ - lib/wikipedia_wrapper/page.rb
44
+ - lib/wikipedia_wrapper/util.rb
45
+ - lib/wikipedia_wrapper/version.rb
46
+ homepage: http://sybil-ehrensberger.com
47
+ licenses:
48
+ - MIT
49
+ metadata: {}
50
+ post_install_message:
51
+ rdoc_options: []
52
+ require_paths:
53
+ - lib
54
+ required_ruby_version: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: '0'
59
+ required_rubygems_version: !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - ">="
62
+ - !ruby/object:Gem::Version
63
+ version: '0'
64
+ requirements: []
65
+ rubyforge_project:
66
+ rubygems_version: 2.4.5
67
+ signing_key:
68
+ specification_version: 4
69
+ summary: WikipediaWrapper retrieves information about a place from Wikipedia
70
+ test_files: []