wiki-api 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,20 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
18
+
19
+ *.DS_Store
20
+ *.swp
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in wiki-api.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2013 Dennis Blommesteijn
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,166 @@
1
+ # Wiki::Api
2
+
3
+ Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes like: Page on which you can request page parameters (like headlines, and text blocks within headlines).
4
+
5
+ NOTE: nokogiri is used for background parsing of HTML. Because I believe there is no point of wrapping internals (composing) for this purpose, nokogiri nodes elements etc. are exposed (http://nokogiri.org/Nokogiri.html) through the wiki-api.
6
+
7
+ Requests to the MediaWiki API use the following URI structure:
8
+
9
+ http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
10
+
11
+
12
+ ### Dependencies (production)
13
+
14
+ * json
15
+ * nokogiri
16
+
17
+
18
+ ### Roadmap
19
+
20
+ * Version (0.0.1) (current)
21
+
22
+ Initial project.
23
+
24
+ * Version (0.0.2)
25
+
26
+ Index important words per block, page, list item;
27
+
28
+ Parse objects for more elements within a Page.
29
+
30
+
31
+
32
+ ### Known Issues
33
+
34
+ None discovered thus far.
35
+
36
+
37
+ ## Installation
38
+
39
+ Add this line to your application's Gemfile (bundler):
40
+
41
+ gem 'wiki-api', git: "git://github.com/dblommesteijn/wiki-api.git"
42
+
43
+ And then execute:
44
+
45
+ $ bundle
46
+
47
+ Or install it yourself (RubyGems):
48
+
49
+ $ gem install wiki-api
50
+
51
+
52
+ ## Setup
53
+
54
+ Define a configuration for your connection (initialize script), this example uses wiktionary.org.
55
+ NOTE: it can connect to both HTTP and HTTPS MediaWikis.
56
+
57
+ ```ruby
58
+ CONFIG = { uri: "http://en.wiktionary.org" }
59
+ ```
60
+
61
+ Setup default configuration (initialize script)
62
+
63
+ ```ruby
64
+ Wiki::Api::Connect.config = CONFIG
65
+ ```
66
+
67
+
68
+ ## Usage
69
+
70
+ ### Query a Page
71
+
72
+ Requesting headlines from a given page.
73
+
74
+ ```ruby
75
+ page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
76
+ page.headlines.each do |headline|
77
+ # printing headline name (PageHeadline)
78
+ puts headline.name
79
+ end
80
+ ```
81
+
82
+ Getting headlines for a given name.
83
+
84
+ ```ruby
85
+ page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
86
+ page.headline("Wiktionary:Welcome,_newcomers").each do |headline|
87
+ # printing headline name (PageHeadline)
88
+ puts headline.name
89
+ end
90
+ ```
91
+
92
+ ### Basic Page structure
93
+
94
+ ```ruby
95
+ page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
96
+
97
+ # iterate PageHeadline objects
98
+ page.headlines.each do |headline|
99
+
100
+ # exposing nokogiri internal elements
101
+ elements = headline.elements.flatten
102
+ elements.each do |element|
103
+ # access Nokogiri::XML::*
104
+ end
105
+
106
+ # string representation of all nested text
107
+ block.to_texts
108
+
109
+ # iterate PageListItem objects
110
+ block.list_items.each do |list_item|
111
+ # string representation of nested text
112
+ list_item.to_text
113
+ # iterate PageLink objects
114
+ list_item.links.each do |link|
115
+ # check part: 'iterate PageLink objects'
116
+ end
117
+ end
118
+
119
+ # iterate PageLink objects
120
+ headline.block.links.each do |link|
121
+ # absolute URI object
122
+ link.uri
123
+ # html link
124
+ link.html
125
+ # link name
126
+ link.title
127
+ # string representation of nested text
128
+ link.to_text
129
+ end
130
+
131
+ end
132
+ ```
133
+
134
+
135
+ ### Example (https://en.wikipedia.org/wiki/Ruby_on_rails)
136
+
137
+ This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and printing the References headline links for each list item.
138
+
139
+ ```ruby
140
+ # setting a target config
141
+ CONFIG = { uri: "https://en.wikipedia.org" }
142
+ Wiki::Api::Connect.config = CONFIG
143
+
144
+ # querying the page
145
+ page = Wiki::Api::Page.new name: "Ruby_on_rails"
146
+
147
+ # get headlines with name Reference (there can be multiple headlines with the same name!)
148
+ headlines = page.headline "References"
149
+
150
+ # iterate headlines
151
+ headlines.each do |headline|
152
+ # iterate list items on the given headline
153
+ headline.block.list_items.each do |list_item|
154
+
155
+ # print the uri of all links
156
+ puts list_item.links.map{ |l| l.uri }
157
+
158
+ end
159
+ end
160
+ ```
161
+
162
+
163
+
164
+
165
+
166
+
data/Rakefile ADDED
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
data/lib/wiki/api.rb ADDED
@@ -0,0 +1,15 @@
1
+ require File.expand_path(File.dirname(__FILE__) + "/api/version")
2
+ require File.expand_path(File.dirname(__FILE__) + "/api/connect")
3
+ require File.expand_path(File.dirname(__FILE__) + "/api/page")
4
+ require File.expand_path(File.dirname(__FILE__) + "/api/page_headline")
5
+ require File.expand_path(File.dirname(__FILE__) + "/api/page_block")
6
+ require File.expand_path(File.dirname(__FILE__) + "/api/page_list_item")
7
+ require File.expand_path(File.dirname(__FILE__) + "/api/page_link")
8
+ require File.expand_path(File.dirname(__FILE__) + "/api/util")
9
+
10
+
11
+ module Wiki
12
+ module Api
13
+ # Your code goes here...
14
+ end
15
+ end
@@ -0,0 +1,70 @@
1
+ require 'net/http'
2
+ require 'json'
3
+ require 'nokogiri'
4
+
5
+ module Wiki
6
+ module Api
7
+
8
+ class Connect
9
+
10
+ attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed
11
+
12
+ def initialize(options={})
13
+ options.merge! @@config unless @@config.nil?
14
+ self.uri = options[:uri] if options.include? :uri
15
+ self.api_path = options[:api_path] if options.include? :api_path
16
+ self.api_options = options[:api_options] if options.include? :api_options
17
+
18
+ # defaults
19
+ self.api_path ||= "/w/api.php"
20
+ self.api_options ||= {action: "parse", format: "json", page: ""}
21
+
22
+ # errors
23
+ raise "no uri given" if self.uri.nil?
24
+ end
25
+
26
+ def connect
27
+ uri = URI("#{self.uri}#{self.api_path}")
28
+ uri.query = URI.encode_www_form self.api_options
29
+ self.http = Net::HTTP.new(uri.host, uri.port)
30
+ if uri.scheme == "https"
31
+ self.http.use_ssl = true
32
+ #self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
33
+ end
34
+ self.request = Net::HTTP::Get.new(uri.request_uri)
35
+ self.response = self.http.request(request)
36
+ end
37
+
38
+ def page page_name
39
+ self.api_options[:page] = page_name
40
+ self.connect
41
+ response = self.response
42
+ json = JSON.parse response.body, {symbolize_names: true}
43
+ raise json[:error][:code] unless valid? json, response
44
+ self.html = json[:parse][:text]
45
+ self.parsed = Nokogiri::HTML self.html[:*]
46
+ end
47
+
48
+ class << self
49
+ def config=(config = {})
50
+ @@config = config
51
+ end
52
+ def config
53
+ @@config ||= []
54
+ end
55
+ end
56
+
57
+ protected
58
+ def valid? json, response
59
+ b = []
60
+ # valid http response
61
+ b << (response.is_a? Net::HTTPOK)
62
+ # not an invalid api response handle
63
+ b << (!json.include? :error)
64
+ !b.include?(false)
65
+ end
66
+
67
+ end
68
+
69
+ end
70
+ end
@@ -0,0 +1,126 @@
1
+ module Wiki
2
+ module Api
3
+
4
+ class Page
5
+
6
+ attr_accessor :name, :parsed_page
7
+
8
+ def initialize(options={})
9
+ self.name = options[:name] if options.include? :name
10
+ @@config ||= nil
11
+ if @@config.nil?
12
+ # use the connection to collect HTML pages for parsing
13
+ @connect = Wiki::Api::Connect.new
14
+ else
15
+ # using a local HTML file for parsing
16
+ end
17
+ end
18
+
19
+ def headlines
20
+ headlines = []
21
+ self.parse_blocks.each do |headline_name, elements|
22
+ headline = PageHeadline.new name: headline_name
23
+ elements.each do |element|
24
+ # nokogiri element
25
+ headline.block << element
26
+ end
27
+ headlines << headline
28
+ end
29
+ headlines
30
+ end
31
+
32
+ def headline headline_name
33
+ headlines = []
34
+ self.parse_blocks(headline_name).each do |headline_name, elements|
35
+ headline = PageHeadline.new name: headline_name
36
+ elements.each do |element|
37
+ # nokogiri element
38
+ headline.block << element
39
+ end
40
+ headlines << headline
41
+ end
42
+ headlines
43
+ end
44
+
45
+
46
+
47
+ def to_html
48
+ self.load_page!
49
+ self.parsed_page.to_xhtml indent: 3, indent_text: " "
50
+ end
51
+
52
+ def reset!
53
+ self.parse_page = nil
54
+ end
55
+
56
+ class << self
57
+ def config=(config = {})
58
+ @@config = config
59
+ end
60
+ end
61
+
62
+ protected
63
+
64
+ def load_page!
65
+ if @@config.nil?
66
+ self.parsed_page ||= @connect.page self.name
67
+ elsif self.parsed_page.nil?
68
+ f = File.open(@@config[:file])
69
+ self.parsed_page = Nokogiri::HTML(f)
70
+ f.close
71
+ end
72
+ end
73
+
74
+
75
+ # parse blocks
76
+ def parse_blocks headline_name = nil
77
+ self.load_page!
78
+ result = {}
79
+
80
+ # get headline nodes by span class
81
+ xs = self.parsed_page.xpath("//span[@class='mw-headline']")
82
+ # filter single headline by name
83
+ xs = xs.reject{|t| t.attributes["id"].value != headline_name } unless headline_name.nil?
84
+
85
+ # NOTE: first_part has no id attribute and thus cannot be filtered or processed within xpath (xs)
86
+ if headline_name == self.name || headline_name.nil?
87
+ x = self.first_part
88
+ result[self.name] ||= []
89
+ result[self.name] << (self.collect_elements(x.parent))
90
+ end
91
+
92
+ # append all blocks
93
+ xs.each do |x|
94
+ headline = x.attributes["id"].value
95
+ elements = self.collect_elements x.parent.next
96
+ result[headline] ||= []
97
+ result[headline] << elements
98
+ end
99
+
100
+ result
101
+ end
102
+
103
+ # harvest first part of the page (missing heading and class="mw-headline")
104
+ def first_part
105
+ self.parsed_page ||= @connect.page self.name
106
+ self.parsed_page.search("p").first.children.first
107
+ end
108
+
109
+ # collect elements within headlines (not nested properties, but next elements)
110
+ def collect_elements element
111
+ # capture first element name
112
+ elements = []
113
+ # iterate text until next headline
114
+ while true do
115
+ elements << element
116
+ element = element.next
117
+ break if element.nil? || element.to_html.include?("class=\"mw-headline\"")
118
+ end
119
+ elements
120
+ end
121
+
122
+
123
+ end
124
+
125
+ end
126
+ end
@@ -0,0 +1,53 @@
1
+ module Wiki
2
+ module Api
3
+
4
+ class PageBlock
5
+
6
+ attr_accessor :elements
7
+
8
+ def initialize options={}
9
+ self.elements = []
10
+ end
11
+
12
+ def << value
13
+ self.elements << value
14
+ end
15
+
16
+ def to_texts
17
+ # TODO: perhaps we should wrap the elements with objects??
18
+ texts = []
19
+ self.elements.flatten.each do |element|
20
+ text = Wiki::Api::Util.element_to_text element if element.is_a? Nokogiri::XML::Element
21
+ next if text.nil?
22
+ next if text.empty?
23
+ texts << text
24
+ end
25
+ texts
26
+ end
27
+
28
+ def list_items
29
+ # TODO: perhaps we should wrap the elements with objects, and request a li per element??
30
+ self.search("li").map do |list_item|
31
+ PageListItem.new element: list_item
32
+ end
33
+ end
34
+
35
+ def links
36
+ # TODO: perhaps we should wrap the elements with objects, and request a li per element??
37
+ self.search("a").map do |a|
38
+ PageLink.new element: a
39
+ end
40
+ end
41
+
42
+ protected
43
+
44
+ def search *paths
45
+ self.elements.flatten.flat_map do |element|
46
+ element.search(*paths)
47
+ end.reject{|t| t.nil?}
48
+ end
49
+
50
+ end
51
+
52
+ end
53
+ end
@@ -0,0 +1,22 @@
1
+ module Wiki
2
+ module Api
3
+
4
+ class PageHeadline
5
+
6
+ attr_accessor :name, :block
7
+
8
+ def initialize options={}
9
+ self.name = options[:name] if options.include? :name
10
+ self.block = PageBlock.new
11
+ end
12
+
13
+ def elements
14
+ self.block.elements
15
+ end
16
+
17
+
18
+
19
+ end
20
+
21
+ end
22
+ end
@@ -0,0 +1,33 @@
1
+ module Wiki
2
+ module Api
3
+
4
+ class PageLink
5
+
6
+ attr_accessor :element
7
+
8
+ def initialize options={}
9
+ self.element = options[:element] if options.include? :element
10
+ end
11
+
12
+ def to_text
13
+ Wiki::Api::Util.element_to_text self.element
14
+ end
15
+
16
+ def uri
17
+ host = Wiki::Api::Connect.config[:uri]
18
+ element_value = self.element.attributes["href"].value
19
+ URI.parse "#{host}#{element_value}"
20
+ end
21
+
22
+ def title
23
+ self.element.attributes["title"].value
24
+ end
25
+
26
+ def html
27
+ "<a href=\"#{self.uri}\" alt=\"#{self.title}\">#{self.title}</a>"
28
+ end
29
+
30
+ end
31
+
32
+ end
33
+ end
@@ -0,0 +1,32 @@
1
+ module Wiki
2
+ module Api
3
+
4
+ class PageListItem
5
+
6
+ attr_accessor :element
7
+
8
+ def initialize options={}
9
+ self.element = options[:element] if options.include? :element
10
+ end
11
+
12
+ def to_text
13
+ Wiki::Api::Util.element_to_text self.element
14
+ end
15
+
16
+ def links
17
+ self.search("a").map do |a|
18
+ PageLink.new element: a
19
+ end
20
+ end
21
+
22
+ protected
23
+
24
+ def search *paths
25
+ self.element.search(*paths)
26
+ end
27
+
28
+
29
+ end
30
+
31
+ end
32
+ end
@@ -0,0 +1,34 @@
1
+ module Wiki
2
+ module Api
3
+
4
+ class Util
5
+
6
+ class << self
7
+
8
+ def element_to_text element
9
+ raise "not an element" unless element.is_a? Nokogiri::XML::Element
10
+ self.clean_text element.text
11
+ end
12
+
13
+ def element_filter_lists element
14
+ raise "not an element" unless element.is_a? Nokogiri::XML::Element
15
+ result = {}
16
+ element.search("li").each_with_index do |li, i|
17
+ li.children.each do |child|
18
+ result[i] ||= []
19
+ result[i] << self.clean_text(child.text)
20
+ end
21
+ end
22
+ result.map{|k,v| v.join("")}
23
+ end
24
+
25
+ protected
26
+ def clean_text text
27
+ text.gsub(/\n/, " ").squeeze(" ").gsub(/\s(\W)/, '\1').gsub(/(\W)\s/, '\1 ').strip
28
+ end
29
+
30
+ end
31
+
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,5 @@
1
+ module Wiki
2
+ module Api
3
+ VERSION = "0.0.1"
4
+ end
5
+ end
@@ -0,0 +1,8 @@
1
+
2
+ class ActiveSupport::TestCase
3
+ setup :global_setup
4
+
5
+ def global_setup
6
+ end
7
+ end
8
+
@@ -0,0 +1,152 @@
1
+
2
+ <html>
3
+ <head />
4
+ </head>
5
+ <body>
6
+ <p>Hello, and welcome! Wiktionary is a <a href="/wiki/multilingual" title="multilingual">multilingual</a>
7
+ <a href="/wiki/free" title="free">free</a>
8
+ <a href="/wiki/dictionary" title="dictionary">dictionary</a>
9
+ , being written <a href="/wiki/collaborative" title="collaborative">collaboratively</a>
10
+ on this website by people from around the world. <a href="/wiki/entry" title="entry">Entries</a>
11
+ may be <a href="/wiki/Help:How_to_edit_a_page" title="Help:How to edit a page">edited</a>
12
+ by anyone!</p>
13
+
14
+ <p>Designed as the lexical companion to <a class="extiw" href="//en.wikipedia.org/wiki/Main_Page" title="w:Main Page">Wikipedia</a>
15
+ , the encyclopedia project, Wiktionary has grown beyond a standard dictionary and now includes a <a href="/wiki/Wiktionary:Wikisaurus" title="Wiktionary:Wikisaurus">thesaurus</a>
16
+ , a rhyme guide, phrase books, language statistics and extensive appendices. We aim to include not only the definition of a word, but also enough information to really understand it. Thus <a href="/wiki/etymology" title="etymology">etymologies</a>
17
+ , pronunciations, sample quotations, synonyms, antonyms and translations are included.</p>
18
+
19
+ <p>Wiktionary is a <a href="/wiki/wiki" title="wiki">wiki</a>
20
+ , which means that you can edit it, and all the content is dual-licensed under both the <a href="/wiki/Wiktionary:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License" title="Wiktionary:Text of Creative Commons Attribution-ShareAlike 3.0 Unported License">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>
21
+ as well as the <a href="/wiki/Wiktionary:GNU_Free_Documentation_License" title="Wiktionary:GNU Free Documentation License">GNU Free Documentation License</a>
22
+ . Before you contribute, you may wish to read through some of our <a class="mw-redirect" href="/wiki/Wiktionary:Help" title="Wiktionary:Help">Help</a>
23
+ pages, and bear in mind that we do things quite differently from other wikis. In particular, we have strict <a href="/wiki/Wiktionary:Entry_layout_explained" title="Wiktionary:Entry layout explained">layout conventions</a>
24
+ and <a href="/wiki/Wiktionary:Criteria_for_inclusion" title="Wiktionary:Criteria for inclusion">inclusion criteria</a>
25
+ . Learn how to <a href="/wiki/Help:Starting_a_new_page" title="Help:Starting a new page">start a page</a>
26
+ , how to <a href="/wiki/Help:How_to_edit_a_page" title="Help:How to edit a page">edit entries</a>
27
+ , experiment in the <a href="/wiki/Wiktionary:Sandbox" title="Wiktionary:Sandbox">sandbox</a>
28
+ and visit our <a href="/wiki/Wiktionary:Community_Portal" title="Wiktionary:Community Portal">Community Portal</a>
29
+ to see how you can participate in the development of Wiktionary.</p>
30
+
31
+ <p>We have created <b>3,311,698</b>
32
+ articles since starting in <a href="/wiki/December" title="December">December</a>
33
+ , 2002, and we’re growing rapidly.</p>
34
+
35
+ <h2>
36
+ <span class="editsection">[ <a href="/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=edit&amp;section=1" title="Edit section: Editing Wiktionary">edit</a>
37
+ ]</span>
38
+ <span class="mw-headline" id="Editing_Wiktionary">Editing Wiktionary</span>
39
+ </h2>
40
+
41
+ <p>People like yourself are very active in building this <a href="/wiki/project" title="project">project</a>
42
+ . While you are reading this, it is likely someone is editing one of our entries. Many <a href="/wiki/knowledgeable" title="knowledgeable">knowledgeable</a>
43
+ people are already at work, but everybody is <a href="/wiki/welcome" title="welcome">welcome</a>
44
+ !</p>
45
+
46
+ <p>Contributing does not require <a class="external text" href="//en.wiktionary.org/w/index.php?title=Special:Userlogin&amp;type=signup">logging in</a>
47
+ , but we would <a href="/wiki/prefer" title="prefer">prefer</a>
48
+ that you do, as it <a href="/wiki/facilitate" title="facilitate">facilitates</a>
49
+ the <a href="/wiki/administration" title="administration">administration</a>
50
+ of this site. (Note that logging in also prevents the <a href="/wiki/IP" title="IP">IP</a>
51
+ address of your computer from being displayed in the <a class="external text" href="//en.wiktionary.org/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=history">page history</a>
52
+ .)</p>
53
+
54
+ <p>You can <a href="/wiki/dive_in" title="dive in">dive in</a>
55
+ right now and add or alter a definition, add example sentences, or help us to properly format or categorize entries. You can even <a href="/wiki/Help:Starting_a_new_page" title="Help:Starting a new page">create a page</a>
56
+ for a term we’re missing. Please feel free to <a href="/wiki/Wiktionary:Be_bold_in_updating_pages" title="Wiktionary:Be bold in updating pages">be bold</a>
57
+ in editing pages!</p>
58
+
59
+ <p>How could allowing everyone to edit produce a high‐quality product instead of total disorder? Because most people want to help, and keeping it open to everyone creates the potential for making many good and ever-improving entries. Records are kept of all changes, so even unhelpful edits can easily be <a href="/wiki/revert" title="revert">reverted</a>
60
+ by other users. To use a now‐famous <a class="extiw" href="//en.wikipedia.org/wiki/Linus%27_Law" title="w:Linus' Law">catchphrase</a>
61
+ , in essence: “Given enough eyeballs, all errors are shallow.”</p>
62
+
63
+ <p>To start out, users might want to use the ‘ <a href="/wiki/Special:RecentChanges" title="Special:RecentChanges">Recent changes</a>
64
+ ’ or ‘ <a href="/wiki/Special:Random" title="Special:Random">Random page</a>
65
+ ’ link (found in the navigation box elsewhere on this page), to get an idea of the kinds of pages you can find here. (It might be surprising how many non-English words are entered here!)</p>
66
+
67
+ <h2>
68
+ <span class="editsection">[ <a href="/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=edit&amp;section=2" title="Edit section: Norms and etiquette">edit</a>
69
+ ]</span>
70
+ <span class="mw-headline" id="Norms_and_etiquette">Norms and etiquette</span>
71
+ </h2>
72
+
73
+ <div class="disambig-see-also-2">
74
+ <i>See also</i>
75
+ <b>
76
+ <a href="/wiki/Help:Interacting_with_humans" title="Help:Interacting with humans">Help:Interacting with humans</a>
77
+ </b>
78
+ </div>
79
+
80
+ <p>One important thing you should know is that we have borrowed from our sister project <a class="extiw" href="//en.wikipedia.org/wiki/" title="wikipedia:">Wikipedia</a>
81
+ some <a class="extiw" href="//en.wikipedia.org/wiki/Norm_(sociology)" title="wikipedia:Norm (sociology)">cultural norms</a>
82
+ you should respect:</p>
83
+
84
+ <ol>
85
+ <li>We try not to argue pointlessly. This isn’t a debate forum. After <a href="/wiki/civilized" title="civilized">civilized</a>
86
+ and reasonable discussion, we try to reach broad <a href="/wiki/consensus" title="consensus">consensus</a>
87
+ in order to present an accurate, neutral summary of all relevant facts for future readers.</li>
88
+
89
+ <li>We try to make the entries as unbiased as we can, meaning that definitions or descriptions — even of controversial topics — are not meant to be platforms for preaching of any kind.</li>
90
+
91
+ <li>Bear in mind this is a <i>dictionary</i>
92
+ , which means there are many <a href="/wiki/Wiktionary:What_Wiktionary_is_not" title="Wiktionary:What Wiktionary is not">things it is not</a>
93
+ .</li>
94
+
95
+ <li>At any point, if you are uncomfortable changing someone else’s work, and you want to add a thought (or question or comment) about an entry or other page, the place is its <a class="mw-redirect" href="/wiki/Wiktionary:Talk_page" title="Wiktionary:Talk page">talk page</a>
96
+ (click on the "discussion" tab at the top or the "Discuss this page" link in the sidebar or elsewhere, depending on your preference skin). Note, though, that we try to keep discussion focused on improving this dictionary.</li>
97
+
98
+ </ol>
99
+
100
+ <p>However, there are also some differences between Wikipedia and Wiktionary. If you already have some experience with editing Wikipedia, then you may find our <a href="/wiki/Wiktionary:Wiktionary_for_Wikipedians" title="Wiktionary:Wiktionary for Wikipedians">guide to Wikipedia users</a>
101
+ useful as a quick introduction.</p>
102
+
103
+ <h2>
104
+ <span class="editsection">[ <a href="/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=edit&amp;section=3" title="Edit section: For more information">edit</a>
105
+ ]</span>
106
+ <span class="mw-headline" id="For_more_information">For more information</span>
107
+ </h2>
108
+
109
+ <p>More introductory information and descriptions of community norms are on the following pages:</p>
110
+
111
+ <ul>
112
+ <li>
113
+ <a href="/wiki/Help:Starting_a_new_page" title="Help:Starting a new page">How to start a page</a>
114
+ </li>
115
+
116
+ <li>
117
+ <a href="/wiki/Help:How_to_edit_a_page" title="Help:How to edit a page">How to edit a page</a>
118
+ </li>
119
+
120
+ <li>
121
+ <a href="/wiki/Wiktionary:Staying_cool_when_the_editing_gets_hot" title="Wiktionary:Staying cool when the editing gets hot">Staying cool when editing gets hot</a>
122
+ </li>
123
+
124
+ <li>
125
+ <a href="/wiki/Help:FAQ" title="Help:FAQ">Wiktionary FAQ</a>
126
+ </li>
127
+
128
+ <li>
129
+ <a href="/wiki/Wiktionary:Wiktionary_for_Wikipedians" title="Wiktionary:Wiktionary for Wikipedians">Wiktionary for Wikipedians</a>
130
+ </li>
131
+
132
+ </ul>
133
+
134
+ <p>For more policy and style guidelines or guidance, see our <a href="/wiki/Wiktionary:Community_Portal" title="Wiktionary:Community Portal">Community Portal</a>
135
+ or <a href="/wiki/Help:Contents" title="Help:Contents">Help:Contents</a>
136
+ .</p>
137
+
138
+
139
+
140
+ <!--
141
+ NewPP limit report
142
+ Preprocessor visited node count: 34/1000000
143
+ Preprocessor generated node count: 534/1500000
144
+ Post-expand include size: 250/2048000 bytes
145
+ Template argument size: 32/2048000 bytes
146
+ Highest expansion depth: 3/40
147
+ Expensive parser function count: 0/500
148
+ -->
149
+
150
+ <!-- Saved in parser cache with key enwiktionary:pcache:idhash:6-0!*!0!!*!*!* and timestamp 20130327133438 -->
151
+ </body>
152
+ </html>
@@ -0,0 +1,51 @@
1
+ require 'rubygems'
2
+ require 'test/unit'
3
+ require File.expand_path(File.dirname(__FILE__) + "/../../lib/wiki/api")
4
+
5
+ #
6
+ # Testing the connection to https://www.mediawiki.org/wiki/API:Main_page
7
+ #
8
+
9
+ class WikiConnect < Test::Unit::TestCase
10
+
11
+ CONFIG = { uri: "http://en.wiktionary.org" }
12
+
13
+ def setup
14
+ Wiki::Api::Connect.config = CONFIG
15
+ end
16
+
17
+ def teardown
18
+ end
19
+
20
+ def test_connection_wiktionary
21
+ c = Wiki::Api::Connect.new uri: "http://en.wiktionary.org"
22
+ ret = c.connect
23
+ assert ret.is_a?(Net::HTTPOK), "invalid response http"
24
+ end
25
+
26
+ def test_connection_https_wiktionary
27
+ c = Wiki::Api::Connect.new uri: "https://en.wiktionary.org"
28
+ ret = c.connect
29
+ assert ret.is_a?(Net::HTTPOK), "invalid response https"
30
+ end
31
+
32
+ def test_page_get
33
+ begin
34
+ c = Wiki::Api::Connect.new
35
+ c.page "Wiktionary:Welcome,_newcomers"
36
+ rescue Exception => e
37
+ assert false, "expected valid page #{e.message}"
38
+ end
39
+ end
40
+
41
+ def test_page_get_non_exist
42
+ begin
43
+ c = Wiki::Api::Connect.new
44
+ response = c.page "asfsldkfjjlkanv98yhok"
45
+ rescue Exception => e
46
+ assert (e.message == "missingtitle"), "expected invalid page #{e.message}"
47
+ end
48
+ end
49
+
50
+
51
+ end
@@ -0,0 +1,171 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'test/unit'
5
+ require File.expand_path(File.dirname(__FILE__) + "/../../lib/wiki/api")
6
+
7
+ #
8
+ # Testing the parsing of URI (with a predownloaded HTML file):
9
+ # /files/Wiktionary_Welcome,_newcomers.html (2013-03-27)
10
+ #
11
+ # Online equivalent:
12
+ # https://en.wiktionary.org/wiki/Wiktionary:Welcome,_newcomers
13
+ #
14
+
15
+ class WikiPageObject < Test::Unit::TestCase
16
+
17
+ # this global is required to resolve URIs (MediaWiki uses relative paths in their links)
18
+ GLB_CONFIG = { uri: "http://en.wiktionary.org" }
19
+
20
+ # use local file for test loading
21
+ PAGE_CONFIG = { file: File.expand_path(File.dirname(__FILE__) + "/files/Wiktionary_Welcome,_newcomers.html") }
22
+
23
+ def setup
24
+ # NOTE: comment Page.config, to use the online MediaWiki instance
25
+ Wiki::Api::Page.config = PAGE_CONFIG
26
+ Wiki::Api::Connect.config = GLB_CONFIG
27
+ @page_name = "Wiktionary:Welcome,_newcomers"
28
+ end
29
+
30
+ def teardown
31
+ end
32
+
33
+ # test simple page invocation
34
+ def test_page_invocation
35
+ page = Wiki::Api::Page.new name: @page_name
36
+ headlines = page.headlines
37
+ assert !headlines.empty?, "expected headlines"
38
+ headlines.each do |headline|
39
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
40
+ end
41
+ end
42
+
43
+ # test nokogiri elements per headline
44
+ def test_page_elements
45
+ page = Wiki::Api::Page.new name: @page_name
46
+ headlines = page.headlines
47
+ assert !headlines.empty?, "expected headlines"
48
+ headlines.each do |headline|
49
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
50
+ elements = headline.elements.flatten
51
+ assert !elements.empty?, "expected elements"
52
+ elements.each do |element|
53
+ assert element.is_a?(Nokogiri::XML::Element) ||
54
+ element.is_a?(Nokogiri::XML::Text) ||
55
+ element.is_a?(Nokogiri::XML::Comment), "expected nokogiri internals"
56
+ end
57
+ end
58
+ end
59
+
60
+ # test pageblocks for each headline
61
+ def test_page_blocks
62
+ page = Wiki::Api::Page.new name: @page_name
63
+ headlines = page.headlines
64
+ assert !headlines.empty?, "expected headlines"
65
+ headlines.each do |headline|
66
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
67
+ block = headline.block
68
+ assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
69
+ end
70
+ end
71
+
72
+ # test string text from page block
73
+ def test_page_block_string_text
74
+ page = Wiki::Api::Page.new name: @page_name
75
+ headlines = page.headlines
76
+ assert !headlines.empty?, "expected headlines"
77
+ headlines.each do |headline|
78
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
79
+ block = headline.block
80
+ assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
81
+ texts = block.to_texts
82
+ assert texts.is_a?(Array) && !texts.empty?, "expected array"
83
+ texts.each do |text|
84
+ assert text.is_a?(String), "expected string"
85
+ end
86
+ end
87
+ end
88
+
89
+ # test list items from page blocks
90
+ def test_page_block_list_items
91
+ page = Wiki::Api::Page.new name: @page_name
92
+ headlines = page.headlines
93
+ assert !headlines.empty?, "expected headlines"
94
+ headlines.each do |headline|
95
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
96
+ block = headline.block
97
+ assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
98
+ list_items = block.list_items
99
+ assert list_items.is_a?(Array), "expected array"
100
+ list_items.each do |list_item|
101
+ assert list_item.is_a?(Wiki::Api::PageListItem), "expected list item object"
102
+ end
103
+ end
104
+ end
105
+
106
+ # test links within page blocks
107
+ def test_page_block_links
108
+ page = Wiki::Api::Page.new name: @page_name
109
+ headlines = page.headlines
110
+ assert !headlines.empty?, "expected headlines"
111
+ headlines.each do |headline|
112
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
113
+ block = headline.block
114
+ assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
115
+ links = block.links
116
+ assert links.is_a?(Array), "expected array"
117
+ links.each do |link|
118
+ assert link.is_a?(Wiki::Api::PageLink), "expected link object"
119
+ assert link.uri.is_a?(URI), "expected uri object"
120
+ end
121
+ end
122
+ end
123
+
124
+ # test links within list items
125
+ def test_page_block_list_inner_links
126
+ page = Wiki::Api::Page.new name: @page_name
127
+ headlines = page.headlines
128
+ assert !headlines.empty?, "expected headlines"
129
+ headlines.each do |headline|
130
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
131
+ block = headline.block
132
+ assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
133
+ list_items = block.list_items
134
+ assert list_items.is_a?(Array), "expected array"
135
+ list_items.each do |list_item|
136
+ assert list_item.is_a?(Wiki::Api::PageListItem), "expected list item object"
137
+ links = list_item.links
138
+ links.each do |link|
139
+ assert link.is_a?(Wiki::Api::PageLink), "expected link object"
140
+ assert link.uri.is_a?(URI), "expected uri object"
141
+ end
142
+ end
143
+ end
144
+ end
145
+
146
+ # test single headline invocation
147
+ def test_page_invocation_single
148
+ page = Wiki::Api::Page.new name: @page_name
149
+ headlines = page.headlines
150
+ assert !headlines.empty?, "expected headlines"
151
+
152
+ # collect headline names
153
+ hs = []
154
+ headlines.each do |headline|
155
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
156
+ hs << headline.name
157
+ end
158
+
159
+ # query every headline manually
160
+ hs.each do |h|
161
+ # test headline query
162
+ headlines = page.headline h
163
+ # test for at least one (many indicates multiple headlines with the same name)
164
+ assert !headlines.empty?, "expected a list of headlines"
165
+ headlines.each do |headline|
166
+ assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
167
+ end
168
+ end
169
+ end
170
+
171
+ end
data/wiki-api.gemspec ADDED
@@ -0,0 +1,29 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'wiki/api/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "wiki-api"
8
+ spec.version = Wiki::Api::VERSION
9
+ spec.authors = ["Dennis Blommesteijn"]
10
+ spec.email = ["dennis@blommesteijn.com"]
11
+ spec.description = %q{MediaWiki API and parser}
12
+ spec.summary = %q{MediaWiki API and parser}
13
+ spec.homepage = ""
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files`.split($/)
17
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_development_dependency "bundler", "~> 1.3"
22
+ spec.add_development_dependency "rake"
23
+
24
+ # dependencies
25
+ spec.add_dependency 'nokogiri'
26
+ spec.add_dependency 'json'
27
+ spec.add_development_dependency "test-unit"
28
+
29
+ end
metadata ADDED
@@ -0,0 +1,165 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: wiki-api
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.0.1
6
+ platform: ruby
7
+ authors:
8
+ - Dennis Blommesteijn
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-03-28 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: bundler
16
+ version_requirements: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - "~>"
19
+ - !ruby/object:Gem::Version
20
+ version: '1.3'
21
+ none: false
22
+ requirement: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.3'
27
+ none: false
28
+ prerelease: false
29
+ type: :development
30
+ - !ruby/object:Gem::Dependency
31
+ name: rake
32
+ version_requirements: !ruby/object:Gem::Requirement
33
+ requirements:
34
+ - - ">="
35
+ - !ruby/object:Gem::Version
36
+ version: !binary |-
37
+ MA==
38
+ none: false
39
+ requirement: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - ">="
42
+ - !ruby/object:Gem::Version
43
+ version: !binary |-
44
+ MA==
45
+ none: false
46
+ prerelease: false
47
+ type: :development
48
+ - !ruby/object:Gem::Dependency
49
+ name: nokogiri
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: !binary |-
55
+ MA==
56
+ none: false
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: !binary |-
62
+ MA==
63
+ none: false
64
+ prerelease: false
65
+ type: :runtime
66
+ - !ruby/object:Gem::Dependency
67
+ name: json
68
+ version_requirements: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - ">="
71
+ - !ruby/object:Gem::Version
72
+ version: !binary |-
73
+ MA==
74
+ none: false
75
+ requirement: !ruby/object:Gem::Requirement
76
+ requirements:
77
+ - - ">="
78
+ - !ruby/object:Gem::Version
79
+ version: !binary |-
80
+ MA==
81
+ none: false
82
+ prerelease: false
83
+ type: :runtime
84
+ - !ruby/object:Gem::Dependency
85
+ name: test-unit
86
+ version_requirements: !ruby/object:Gem::Requirement
87
+ requirements:
88
+ - - ">="
89
+ - !ruby/object:Gem::Version
90
+ version: !binary |-
91
+ MA==
92
+ none: false
93
+ requirement: !ruby/object:Gem::Requirement
94
+ requirements:
95
+ - - ">="
96
+ - !ruby/object:Gem::Version
97
+ version: !binary |-
98
+ MA==
99
+ none: false
100
+ prerelease: false
101
+ type: :development
102
+ description: MediaWiki API and parser
103
+ email:
104
+ - dennis@blommesteijn.com
105
+ executables: []
106
+ extensions: []
107
+ extra_rdoc_files: []
108
+ files:
109
+ - ".gitignore"
110
+ - Gemfile
111
+ - LICENSE.txt
112
+ - README.md
113
+ - Rakefile
114
+ - lib/wiki/api.rb
115
+ - lib/wiki/api/connect.rb
116
+ - lib/wiki/api/page.rb
117
+ - lib/wiki/api/page_block.rb
118
+ - lib/wiki/api/page_headline.rb
119
+ - lib/wiki/api/page_link.rb
120
+ - lib/wiki/api/page_list_item.rb
121
+ - lib/wiki/api/util.rb
122
+ - lib/wiki/api/version.rb
123
+ - test/test_helper.rb
124
+ - test/unit/files/Wiktionary_Welcome,_newcomers.html
125
+ - test/unit/wiki_connect.rb
126
+ - test/unit/wiki_page_object.rb
127
+ - wiki-api.gemspec
128
+ homepage: ''
129
+ licenses:
130
+ - MIT
131
+ post_install_message:
132
+ rdoc_options: []
133
+ require_paths:
134
+ - lib
135
+ required_ruby_version: !ruby/object:Gem::Requirement
136
+ requirements:
137
+ - - ">="
138
+ - !ruby/object:Gem::Version
139
+ segments:
140
+ - 0
141
+ hash: 2
142
+ version: !binary |-
143
+ MA==
144
+ none: false
145
+ required_rubygems_version: !ruby/object:Gem::Requirement
146
+ requirements:
147
+ - - ">="
148
+ - !ruby/object:Gem::Version
149
+ segments:
150
+ - 0
151
+ hash: 2
152
+ version: !binary |-
153
+ MA==
154
+ none: false
155
+ requirements: []
156
+ rubyforge_project:
157
+ rubygems_version: 1.8.24
158
+ signing_key:
159
+ specification_version: 3
160
+ summary: MediaWiki API and parser
161
+ test_files:
162
+ - test/test_helper.rb
163
+ - test/unit/files/Wiktionary_Welcome,_newcomers.html
164
+ - test/unit/wiki_connect.rb
165
+ - test/unit/wiki_page_object.rb