RubyGems - wiki-api - Versions diffs - 0.0.1 - Mend

wiki-api 0.0.1

Files changed (20) hide show

data/.gitignore +20 -0
data/Gemfile +4 -0
data/LICENSE.txt +22 -0
data/README.md +166 -0
data/Rakefile +1 -0
data/lib/wiki/api.rb +15 -0
data/lib/wiki/api/connect.rb +70 -0
data/lib/wiki/api/page.rb +126 -0
data/lib/wiki/api/page_block.rb +53 -0
data/lib/wiki/api/page_headline.rb +22 -0
data/lib/wiki/api/page_link.rb +33 -0
data/lib/wiki/api/page_list_item.rb +32 -0
data/lib/wiki/api/util.rb +34 -0
data/lib/wiki/api/version.rb +5 -0
data/test/test_helper.rb +8 -0
data/test/unit/files/Wiktionary_Welcome,_newcomers.html +152 -0
data/test/unit/wiki_connect.rb +51 -0
data/test/unit/wiki_page_object.rb +171 -0
data/wiki-api.gemspec +29 -0
metadata +165 -0

data/.gitignore ADDED Viewed

@@ -0,0 +1,20 @@
+*.gem
+*.rbc
+.bundle
+.config
+.yardoc
+Gemfile.lock
+InstalledFiles
+_yardoc
+coverage
+doc/
+lib/bundler/man
+pkg
+rdoc
+spec/reports
+test/tmp
+test/version_tmp
+tmp
+*.DS_Store
+*.swp

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in wiki-api.gemspec
+gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,22 @@
+Copyright (c) 2013 Dennis Blommesteijn
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,166 @@
+# Wiki::Api
+Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes like: Page on which you can request page parameters (like headlines, and text blocks within headlines).
+NOTE: nokogiri is used for background parsing of HTML. Because I believe there is no point of wrapping internals (composing) for this purpose, nokogiri nodes elements etc. are exposed (http://nokogiri.org/Nokogiri.html) through the wiki-api.
+Requests to the MediaWiki API use the following URI structure:
+    http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
+### Dependencies (production)
+* json
+* nokogiri
+### Roadmap
+* Version (0.0.1) (current)
+  Initial project.
+* Version (0.0.2)
+  Index important words per block, page, list item;
+  Parse objects for more elements within a Page.
+### Known Issues
+None discovered thus far.
+## Installation
+Add this line to your application's Gemfile (bundler):
+    gem 'wiki-api', git: "git://github.com/dblommesteijn/wiki-api.git"
+And then execute:
+    $ bundle
+Or install it yourself (RubyGems):
+    $ gem install wiki-api
+## Setup
+Define a configuration for your connection (initialize script), this example uses wiktionary.org.
+NOTE: it can connect to both HTTP and HTTPS MediaWikis.
+```ruby
+CONFIG = { uri: "http://en.wiktionary.org" }
+```
+Setup default configuration (initialize script)
+```ruby
+Wiki::Api::Connect.config = CONFIG
+```
+## Usage
+### Query a Page
+Requesting headlines from a given page.
+```ruby
+page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page.headlines.each do |headline|
+  # printing headline name (PageHeadline)
+  puts headline.name
+end
+```
+Getting headlines for a given name.
+```ruby
+page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page.headline("Wiktionary:Welcome,_newcomers").each do |headline|
+  # printing headline name (PageHeadline)
+  puts headline.name
+end
+```
+### Basic Page structure
+```ruby
+page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+# iterate PageHeadline objects
+page.headlines.each do |headline|
+  # exposing nokogiri internal elements
+  elements = headline.elements.flatten
+  elements.each do |element|
+    # access Nokogiri::XML::*
+  end
+  # string representation of all nested text
+  block.to_texts
+  # iterate PageListItem objects
+  block.list_items.each do |list_item|
+    # string representation of nested text
+    list_item.to_text
+    # iterate PageLink objects
+    list_item.links.each do |link|
+      # check part: 'iterate PageLink objects'
+    end
+  end
+  # iterate PageLink objects
+  headline.block.links.each do |link|
+    # absolute URI object
+    link.uri
+    # html link
+    link.html
+    # link name
+    link.title
+    # string representation of nested text
+    link.to_text
+  end
+end
+```
+### Example (https://en.wikipedia.org/wiki/Ruby_on_rails)
+This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and printing the References headline links for each list item.
+```ruby
+# setting a target config
+CONFIG = { uri: "https://en.wikipedia.org" }
+Wiki::Api::Connect.config = CONFIG
+# querying the page
+page = Wiki::Api::Page.new name: "Ruby_on_rails"
+# get headlines with name Reference (there can be multiple headlines with the same name!)
+headlines = page.headline "References"
+# iterate headlines
+headlines.each do |headline|
+  # iterate list items on the given headline
+  headline.block.list_items.each do |list_item|
+    # print the uri of all links
+    puts list_item.links.map{ |l| l.uri }
+  end
+end
+```

data/Rakefile ADDED Viewed

	@@ -0,0 +1 @@
1	+ require "bundler/gem_tasks"

data/lib/wiki/api.rb ADDED Viewed

@@ -0,0 +1,15 @@
+require File.expand_path(File.dirname(__FILE__) + "/api/version")
+require File.expand_path(File.dirname(__FILE__) + "/api/connect")
+require File.expand_path(File.dirname(__FILE__) + "/api/page")
+require File.expand_path(File.dirname(__FILE__) + "/api/page_headline")
+require File.expand_path(File.dirname(__FILE__) + "/api/page_block")
+require File.expand_path(File.dirname(__FILE__) + "/api/page_list_item")
+require File.expand_path(File.dirname(__FILE__) + "/api/page_link")
+require File.expand_path(File.dirname(__FILE__) + "/api/util")
+module Wiki
+  module Api
+    # Your code goes here...
+  end
+end

data/lib/wiki/api/connect.rb ADDED Viewed

@@ -0,0 +1,70 @@
+require 'net/http'
+require 'json'
+require 'nokogiri'
+module Wiki
+  module Api
+    class Connect
+      attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed
+      def initialize(options={})
+        options.merge! @@config unless @@config.nil?
+        self.uri = options[:uri] if options.include? :uri
+        self.api_path = options[:api_path] if options.include? :api_path
+        self.api_options = options[:api_options] if options.include? :api_options
+        # defaults
+        self.api_path ||= "/w/api.php"
+        self.api_options ||= {action: "parse", format: "json", page: ""}
+        # errors
+        raise "no uri given" if self.uri.nil?
+      end
+      def connect
+        uri = URI("#{self.uri}#{self.api_path}")
+        uri.query = URI.encode_www_form self.api_options
+        self.http = Net::HTTP.new(uri.host, uri.port)
+        if uri.scheme == "https"
+          self.http.use_ssl = true
+          #self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
+        end
+        self.request = Net::HTTP::Get.new(uri.request_uri)
+        self.response = self.http.request(request)
+      end
+      def page page_name
+        self.api_options[:page] = page_name
+        self.connect
+        response = self.response
+        json = JSON.parse response.body, {symbolize_names: true}
+        raise json[:error][:code] unless valid? json, response
+        self.html = json[:parse][:text]
+        self.parsed = Nokogiri::HTML self.html[:*]
+      end
+      class << self
+        def config=(config = {})
+          @@config = config
+        end
+        def config
+          @@config ||= []
+        end
+      end
+      protected
+      def valid? json, response
+        b = []
+        # valid http response
+        b << (response.is_a? Net::HTTPOK)
+        # not an invalid api response handle
+        b << (!json.include? :error)
+        !b.include?(false)
+      end
+    end
+  end
+end

data/lib/wiki/api/page.rb ADDED Viewed

@@ -0,0 +1,126 @@
+module Wiki
+  module Api
+    class Page
+      attr_accessor :name, :parsed_page
+      def initialize(options={})
+        self.name = options[:name] if options.include? :name
+        @@config ||= nil
+        if @@config.nil?
+          # use the connection to collect HTML pages for parsing
+          @connect = Wiki::Api::Connect.new
+        else
+          # using a local HTML file for parsing
+        end
+      end
+      def headlines
+        headlines = []
+        self.parse_blocks.each do |headline_name, elements|
+          headline = PageHeadline.new name: headline_name
+          elements.each do |element|
+            # nokogiri element
+            headline.block << element
+          end
+          headlines << headline
+        end
+        headlines
+      end
+      def headline headline_name
+        headlines = []
+        self.parse_blocks(headline_name).each do |headline_name, elements|
+          headline = PageHeadline.new name: headline_name
+          elements.each do |element|
+            # nokogiri element
+            headline.block << element
+          end
+          headlines << headline
+        end
+        headlines
+      end
+      def to_html
+        self.load_page!
+        self.parsed_page.to_xhtml indent: 3, indent_text: " "
+      end
+      def reset!
+        self.parse_page = nil
+      end
+      class << self
+        def config=(config = {})
+          @@config = config
+        end
+      end
+      protected
+      def load_page!
+        if @@config.nil?
+          self.parsed_page ||= @connect.page self.name
+        elsif self.parsed_page.nil?
+          f = File.open(@@config[:file])
+          self.parsed_page = Nokogiri::HTML(f)
+          f.close
+        end
+      end
+      # parse blocks
+      def parse_blocks headline_name = nil
+        self.load_page!
+        result = {}
+        # get headline nodes by span class
+        xs = self.parsed_page.xpath("//span[@class='mw-headline']")
+        # filter single headline by name
+        xs = xs.reject{|t| t.attributes["id"].value != headline_name } unless headline_name.nil?
+        # NOTE: first_part has no id attribute and thus cannot be filtered or processed within xpath (xs)
+        if headline_name == self.name || headline_name.nil?
+          x = self.first_part
+          result[self.name] ||= []
+          result[self.name] << (self.collect_elements(x.parent))
+        end
+        # append all blocks
+        xs.each do |x|
+          headline = x.attributes["id"].value
+          elements = self.collect_elements x.parent.next
+          result[headline] ||= []
+          result[headline] << elements
+        end
+        result
+      end
+      # harvest first part of the page (missing heading and class="mw-headline")
+      def first_part
+        self.parsed_page ||= @connect.page self.name
+        self.parsed_page.search("p").first.children.first
+      end
+      # collect elements within headlines (not nested properties, but next elements)
+      def collect_elements element
+        # capture first element name
+        elements = []
+        # iterate text until next headline
+        while true do
+          elements << element
+          element = element.next
+          break if element.nil? || element.to_html.include?("class=\"mw-headline\"")
+        end
+        elements
+      end
+    end
+  end
+end

data/lib/wiki/api/page_block.rb ADDED Viewed

@@ -0,0 +1,53 @@
+module Wiki
+  module Api
+    class PageBlock
+      attr_accessor :elements
+      def initialize options={}
+        self.elements = []
+      end
+      def << value
+        self.elements << value
+      end
+      def to_texts
+        # TODO: perhaps we should wrap the elements with objects??
+        texts = []
+        self.elements.flatten.each do |element|
+          text = Wiki::Api::Util.element_to_text element if element.is_a? Nokogiri::XML::Element
+          next if text.nil?
+          next if text.empty?
+          texts << text
+        end
+        texts
+      end
+      def list_items
+        # TODO: perhaps we should wrap the elements with objects, and request a li per element??
+        self.search("li").map do |list_item|
+          PageListItem.new element: list_item
+        end
+      end
+      def links
+        # TODO: perhaps we should wrap the elements with objects, and request a li per element??
+        self.search("a").map do |a|
+          PageLink.new element: a
+        end
+      end
+      protected
+      def search *paths
+        self.elements.flatten.flat_map do |element|
+          element.search(*paths)
+        end.reject{|t| t.nil?}
+      end
+    end
+  end
+end

data/lib/wiki/api/page_headline.rb ADDED Viewed

@@ -0,0 +1,22 @@
+module Wiki
+  module Api
+    class PageHeadline
+      attr_accessor :name, :block
+      def initialize options={}
+        self.name = options[:name] if options.include? :name
+        self.block = PageBlock.new
+      end
+      def elements
+        self.block.elements
+      end
+    end
+  end
+end

data/lib/wiki/api/page_link.rb ADDED Viewed

@@ -0,0 +1,33 @@
+module Wiki
+  module Api
+    class PageLink
+      attr_accessor :element
+      def initialize options={}
+        self.element = options[:element] if options.include? :element
+      end
+      def to_text
+        Wiki::Api::Util.element_to_text self.element
+      end
+      def uri
+        host = Wiki::Api::Connect.config[:uri]
+        element_value = self.element.attributes["href"].value
+        URI.parse "#{host}#{element_value}"
+      end
+      def title
+        self.element.attributes["title"].value
+      end
+      def html
+        "<a href=\"#{self.uri}\" alt=\"#{self.title}\">#{self.title}</a>"
+      end
+    end
+  end
+end

data/lib/wiki/api/page_list_item.rb ADDED Viewed

@@ -0,0 +1,32 @@
+module Wiki
+  module Api
+    class PageListItem
+      attr_accessor :element
+      def initialize options={}
+        self.element = options[:element] if options.include? :element
+      end
+      def to_text
+        Wiki::Api::Util.element_to_text self.element
+      end
+      def links
+        self.search("a").map do |a|
+          PageLink.new element: a
+        end
+      end
+      protected
+      def search *paths
+        self.element.search(*paths)
+      end
+    end
+  end
+end

data/lib/wiki/api/util.rb ADDED Viewed

@@ -0,0 +1,34 @@
+module Wiki
+  module Api
+    class Util
+      class << self
+        def element_to_text element
+          raise "not an element" unless element.is_a? Nokogiri::XML::Element
+          self.clean_text element.text
+        end
+        def element_filter_lists element
+          raise "not an element" unless element.is_a? Nokogiri::XML::Element
+          result = {}
+          element.search("li").each_with_index do |li, i|
+            li.children.each do |child|
+              result[i] ||= []
+              result[i] << self.clean_text(child.text)
+            end
+          end
+          result.map{|k,v| v.join("")}
+        end
+        protected
+        def clean_text text
+          text.gsub(/\n/, " ").squeeze(" ").gsub(/\s(\W)/, '\1').gsub(/(\W)\s/, '\1 ').strip
+        end
+      end
+    end
+  end
+end

data/lib/wiki/api/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+module Wiki
+  module Api
+    VERSION = "0.0.1"
+  end
+end

data/test/test_helper.rb ADDED Viewed

@@ -0,0 +1,8 @@
+class ActiveSupport::TestCase
+  setup :global_setup
+  def global_setup
+  end
+end

data/test/unit/files/Wiktionary_Welcome,_newcomers.html ADDED Viewed

@@ -0,0 +1,152 @@
+<html>
+   <head />
+</head>
+   <body>
+      <p>Hello, and welcome! Wiktionary is a       <a href="/wiki/multilingual" title="multilingual">multilingual</a>
+       <a href="/wiki/free" title="free">free</a>
+       <a href="/wiki/dictionary" title="dictionary">dictionary</a>
+, being written       <a href="/wiki/collaborative" title="collaborative">collaboratively</a>
+ on this website by people from around the world.       <a href="/wiki/entry" title="entry">Entries</a>
+ may be       <a href="/wiki/Help:How_to_edit_a_page" title="Help:How to edit a page">edited</a>
+ by anyone!</p>
+      <p>Designed as the lexical companion to       <a class="extiw" href="//en.wikipedia.org/wiki/Main_Page" title="w:Main Page">Wikipedia</a>
+, the encyclopedia project, Wiktionary has grown beyond a standard dictionary and now includes a       <a href="/wiki/Wiktionary:Wikisaurus" title="Wiktionary:Wikisaurus">thesaurus</a>
+, a rhyme guide, phrase books, language statistics and extensive appendices. We aim to include not only the definition of a word, but also enough information to really understand it. Thus       <a href="/wiki/etymology" title="etymology">etymologies</a>
+, pronunciations, sample quotations, synonyms, antonyms and translations are included.</p>
+      <p>Wiktionary is a       <a href="/wiki/wiki" title="wiki">wiki</a>
+, which means that you can edit it, and all the content is dual-licensed under both the       <a href="/wiki/Wiktionary:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License" title="Wiktionary:Text of Creative Commons Attribution-ShareAlike 3.0 Unported License">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>
+ as well as the       <a href="/wiki/Wiktionary:GNU_Free_Documentation_License" title="Wiktionary:GNU Free Documentation License">GNU Free Documentation License</a>
+. Before you contribute, you may wish to read through some of our       <a class="mw-redirect" href="/wiki/Wiktionary:Help" title="Wiktionary:Help">Help</a>
+ pages, and bear in mind that we do things quite differently from other wikis. In particular, we have strict       <a href="/wiki/Wiktionary:Entry_layout_explained" title="Wiktionary:Entry layout explained">layout conventions</a>
+ and       <a href="/wiki/Wiktionary:Criteria_for_inclusion" title="Wiktionary:Criteria for inclusion">inclusion criteria</a>
+. Learn how to       <a href="/wiki/Help:Starting_a_new_page" title="Help:Starting a new page">start a page</a>
+, how to       <a href="/wiki/Help:How_to_edit_a_page" title="Help:How to edit a page">edit entries</a>
+, experiment in the       <a href="/wiki/Wiktionary:Sandbox" title="Wiktionary:Sandbox">sandbox</a>
+ and visit our       <a href="/wiki/Wiktionary:Community_Portal" title="Wiktionary:Community Portal">Community Portal</a>
+ to see how you can participate in the development of Wiktionary.</p>
+      <p>We have created       <b>3,311,698</b>
+ articles since starting in       <a href="/wiki/December" title="December">December</a>
+, 2002, and we’re growing rapidly.</p>
+      <h2>
+         <span class="editsection">[         <a href="/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=edit&amp;section=1" title="Edit section: Editing Wiktionary">edit</a>
+]</span>
+          <span class="mw-headline" id="Editing_Wiktionary">Editing Wiktionary</span>
+      </h2>
+      <p>People like yourself are very active in building this       <a href="/wiki/project" title="project">project</a>
+. While you are reading this, it is likely someone is editing one of our entries. Many       <a href="/wiki/knowledgeable" title="knowledgeable">knowledgeable</a>
+ people are already at work, but everybody is       <a href="/wiki/welcome" title="welcome">welcome</a>
+!</p>
+      <p>Contributing does not require       <a class="external text" href="//en.wiktionary.org/w/index.php?title=Special:Userlogin&amp;type=signup">logging in</a>
+, but we would       <a href="/wiki/prefer" title="prefer">prefer</a>
+ that you do, as it       <a href="/wiki/facilitate" title="facilitate">facilitates</a>
+ the       <a href="/wiki/administration" title="administration">administration</a>
+ of this site. (Note that logging in also prevents the       <a href="/wiki/IP" title="IP">IP</a>
+ address of your computer from being displayed in the       <a class="external text" href="//en.wiktionary.org/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=history">page history</a>
+.)</p>
+      <p>You can       <a href="/wiki/dive_in" title="dive in">dive in</a>
+ right now and add or alter a definition, add example sentences, or help us to properly format or categorize entries. You can even       <a href="/wiki/Help:Starting_a_new_page" title="Help:Starting a new page">create a page</a>
+ for a term we’re missing. Please feel free to       <a href="/wiki/Wiktionary:Be_bold_in_updating_pages" title="Wiktionary:Be bold in updating pages">be bold</a>
+ in editing pages!</p>
+      <p>How could allowing everyone to edit produce a high‐quality product instead of total disorder? Because most people want to help, and keeping it open to everyone creates the potential for making many good and ever-improving entries. Records are kept of all changes, so even unhelpful edits can easily be       <a href="/wiki/revert" title="revert">reverted</a>
+ by other users. To use a now‐famous       <a class="extiw" href="//en.wikipedia.org/wiki/Linus%27_Law" title="w:Linus' Law">catchphrase</a>
+, in essence: “Given enough eyeballs, all errors are shallow.”</p>
+      <p>To start out, users might want to use the ‘      <a href="/wiki/Special:RecentChanges" title="Special:RecentChanges">Recent changes</a>
+’ or ‘      <a href="/wiki/Special:Random" title="Special:Random">Random page</a>
+’ link (found in the navigation box elsewhere on this page), to get an idea of the kinds of pages you can find here. (It might be surprising how many non-English words are entered here!)</p>
+      <h2>
+         <span class="editsection">[         <a href="/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=edit&amp;section=2" title="Edit section: Norms and etiquette">edit</a>
+]</span>
+          <span class="mw-headline" id="Norms_and_etiquette">Norms and etiquette</span>
+      </h2>
+      <div class="disambig-see-also-2">
+         <i>See also</i>
+          <b>
+            <a href="/wiki/Help:Interacting_with_humans" title="Help:Interacting with humans">Help:Interacting with humans</a>
+         </b>
+      </div>
+      <p>One important thing you should know is that we have borrowed from our sister project       <a class="extiw" href="//en.wikipedia.org/wiki/" title="wikipedia:">Wikipedia</a>
+ some       <a class="extiw" href="//en.wikipedia.org/wiki/Norm_(sociology)" title="wikipedia:Norm (sociology)">cultural norms</a>
+ you should respect:</p>
+      <ol>
+      <li>We try not to argue pointlessly. This isn’t a debate forum. After       <a href="/wiki/civilized" title="civilized">civilized</a>
+ and reasonable discussion, we try to reach broad       <a href="/wiki/consensus" title="consensus">consensus</a>
+ in order to present an accurate, neutral summary of all relevant facts for future readers.</li>
+      <li>We try to make the entries as unbiased as we can, meaning that definitions or descriptions — even of controversial topics — are not meant to be platforms for preaching of any kind.</li>
+      <li>Bear in mind this is a       <i>dictionary</i>
+, which means there are many       <a href="/wiki/Wiktionary:What_Wiktionary_is_not" title="Wiktionary:What Wiktionary is not">things it is not</a>
+.</li>
+      <li>At any point, if you are uncomfortable changing someone else’s work, and you want to add a thought (or question or comment) about an entry or other page, the place is its       <a class="mw-redirect" href="/wiki/Wiktionary:Talk_page" title="Wiktionary:Talk page">talk page</a>
+ (click on the "discussion" tab at the top or the "Discuss this page" link in the sidebar or elsewhere, depending on your preference skin). Note, though, that we try to keep discussion focused on improving this dictionary.</li>
+</ol>
+      <p>However, there are also some differences between Wikipedia and Wiktionary. If you already have some experience with editing Wikipedia, then you may find our       <a href="/wiki/Wiktionary:Wiktionary_for_Wikipedians" title="Wiktionary:Wiktionary for Wikipedians">guide to Wikipedia users</a>
+ useful as a quick introduction.</p>
+      <h2>
+         <span class="editsection">[         <a href="/w/index.php?title=Wiktionary:Welcome,_newcomers&amp;action=edit&amp;section=3" title="Edit section: For more information">edit</a>
+]</span>
+          <span class="mw-headline" id="For_more_information">For more information</span>
+      </h2>
+      <p>More introductory information and descriptions of community norms are on the following pages:</p>
+      <ul>
+      <li>
+         <a href="/wiki/Help:Starting_a_new_page" title="Help:Starting a new page">How to start a page</a>
+      </li>
+      <li>
+         <a href="/wiki/Help:How_to_edit_a_page" title="Help:How to edit a page">How to edit a page</a>
+      </li>
+      <li>
+         <a href="/wiki/Wiktionary:Staying_cool_when_the_editing_gets_hot" title="Wiktionary:Staying cool when the editing gets hot">Staying cool when editing gets hot</a>
+      </li>
+      <li>
+         <a href="/wiki/Help:FAQ" title="Help:FAQ">Wiktionary FAQ</a>
+      </li>
+      <li>
+         <a href="/wiki/Wiktionary:Wiktionary_for_Wikipedians" title="Wiktionary:Wiktionary for Wikipedians">Wiktionary for Wikipedians</a>
+      </li>
+</ul>
+      <p>For more policy and style guidelines or guidance, see our       <a href="/wiki/Wiktionary:Community_Portal" title="Wiktionary:Community Portal">Community Portal</a>
+ or       <a href="/wiki/Help:Contents" title="Help:Contents">Help:Contents</a>
+.</p>
+<!--
+NewPP limit report
+Preprocessor visited node count: 34/1000000
+Preprocessor generated node count: 534/1500000
+Post-expand include size: 250/2048000 bytes
+Template argument size: 32/2048000 bytes
+Highest expansion depth: 3/40
+Expensive parser function count: 0/500
+-->
+<!-- Saved in parser cache with key enwiktionary:pcache:idhash:6-0!*!0!!*!*!* and timestamp 20130327133438 -->
+   </body>
+</html>

data/test/unit/wiki_connect.rb ADDED Viewed

@@ -0,0 +1,51 @@
+require 'rubygems'
+require 'test/unit'
+require File.expand_path(File.dirname(__FILE__) + "/../../lib/wiki/api")
+#
+# Testing the connection to https://www.mediawiki.org/wiki/API:Main_page
+#
+class WikiConnect < Test::Unit::TestCase
+  CONFIG = { uri: "http://en.wiktionary.org" }
+  def setup
+    Wiki::Api::Connect.config = CONFIG
+  end
+  def teardown
+  end
+  def test_connection_wiktionary
+    c = Wiki::Api::Connect.new uri: "http://en.wiktionary.org"
+    ret = c.connect
+    assert ret.is_a?(Net::HTTPOK), "invalid response http"
+  end
+  def test_connection_https_wiktionary
+    c = Wiki::Api::Connect.new uri: "https://en.wiktionary.org"
+    ret = c.connect
+    assert ret.is_a?(Net::HTTPOK), "invalid response https"
+  end
+  def test_page_get
+    begin
+      c = Wiki::Api::Connect.new
+      c.page "Wiktionary:Welcome,_newcomers"
+    rescue Exception => e
+      assert false, "expected valid page #{e.message}"
+    end
+  end
+  def test_page_get_non_exist
+    begin
+      c = Wiki::Api::Connect.new
+      response = c.page "asfsldkfjjlkanv98yhok"
+    rescue Exception => e
+      assert (e.message == "missingtitle"), "expected invalid page #{e.message}"
+    end
+  end
+end

data/test/unit/wiki_page_object.rb ADDED Viewed

@@ -0,0 +1,171 @@
+# encoding: utf-8
+require 'rubygems'
+require 'test/unit'
+require File.expand_path(File.dirname(__FILE__) + "/../../lib/wiki/api")
+#
+# Testing the parsing of URI (with a predownloaded HTML file):
+#   /files/Wiktionary_Welcome,_newcomers.html (2013-03-27)
+#
+# Online equivalent:
+#   https://en.wiktionary.org/wiki/Wiktionary:Welcome,_newcomers
+#
+class WikiPageObject < Test::Unit::TestCase
+  # this global is required to resolve URIs (MediaWiki uses relative paths in their links)
+  GLB_CONFIG = { uri: "http://en.wiktionary.org" }
+  # use local file for test loading
+  PAGE_CONFIG = { file: File.expand_path(File.dirname(__FILE__) + "/files/Wiktionary_Welcome,_newcomers.html") }
+  def setup
+    # NOTE: comment Page.config, to use the online MediaWiki instance
+    Wiki::Api::Page.config = PAGE_CONFIG
+    Wiki::Api::Connect.config = GLB_CONFIG
+    @page_name = "Wiktionary:Welcome,_newcomers"
+  end
+  def teardown
+  end
+  # test simple page invocation
+  def test_page_invocation
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+    end
+  end
+  # test nokogiri elements per headline
+  def test_page_elements
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      elements = headline.elements.flatten
+      assert !elements.empty?, "expected elements"
+      elements.each do |element|
+        assert element.is_a?(Nokogiri::XML::Element) ||
+          element.is_a?(Nokogiri::XML::Text) ||
+          element.is_a?(Nokogiri::XML::Comment), "expected nokogiri internals"
+      end
+    end
+  end
+  # test pageblocks for each headline
+  def test_page_blocks
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      block = headline.block
+      assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
+    end
+  end
+  # test string text from page block
+  def test_page_block_string_text
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      block = headline.block
+      assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
+      texts = block.to_texts
+      assert texts.is_a?(Array) && !texts.empty?, "expected array"
+      texts.each do |text|
+        assert text.is_a?(String), "expected string"
+      end
+    end
+  end
+  # test list items from page blocks
+  def test_page_block_list_items
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      block = headline.block
+      assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
+      list_items = block.list_items
+      assert list_items.is_a?(Array), "expected array"
+      list_items.each do |list_item|
+        assert list_item.is_a?(Wiki::Api::PageListItem), "expected list item object"
+      end
+    end
+  end
+  # test links within page blocks
+  def test_page_block_links
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      block = headline.block
+      assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
+      links = block.links
+      assert links.is_a?(Array), "expected array"
+      links.each do |link|
+        assert link.is_a?(Wiki::Api::PageLink), "expected link object"
+        assert link.uri.is_a?(URI), "expected uri object"
+      end
+    end
+  end
+  # test links within list items
+  def test_page_block_list_inner_links
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      block = headline.block
+      assert block.is_a?(Wiki::Api::PageBlock), "expected block object"
+      list_items = block.list_items
+      assert list_items.is_a?(Array), "expected array"
+      list_items.each do |list_item|
+        assert list_item.is_a?(Wiki::Api::PageListItem), "expected list item object"
+        links = list_item.links
+        links.each do |link|
+          assert link.is_a?(Wiki::Api::PageLink), "expected link object"
+          assert link.uri.is_a?(URI), "expected uri object"
+        end
+      end
+    end
+  end
+  # test single headline invocation
+  def test_page_invocation_single
+    page = Wiki::Api::Page.new name: @page_name
+    headlines = page.headlines
+    assert !headlines.empty?, "expected headlines"
+    # collect headline names
+    hs = []
+    headlines.each do |headline|
+      assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      hs << headline.name
+    end
+    # query every headline manually
+    hs.each do |h|
+      # test headline query
+      headlines = page.headline h
+      # test for at least one (many indicates multiple headlines with the same name)
+      assert !headlines.empty?, "expected a list of headlines"
+      headlines.each do |headline|
+        assert headline.is_a?(Wiki::Api::PageHeadline), "expected headline object"
+      end
+    end
+  end
+end

data/wiki-api.gemspec ADDED Viewed

@@ -0,0 +1,29 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'wiki/api/version'
+Gem::Specification.new do |spec|
+  spec.name          = "wiki-api"
+  spec.version       = Wiki::Api::VERSION
+  spec.authors       = ["Dennis Blommesteijn"]
+  spec.email         = ["dennis@blommesteijn.com"]
+  spec.description   = %q{MediaWiki API and parser}
+  spec.summary       = %q{MediaWiki API and parser}
+  spec.homepage      = ""
+  spec.license       = "MIT"
+  spec.files         = `git ls-files`.split($/)
+  spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
+  spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
+  spec.require_paths = ["lib"]
+  spec.add_development_dependency "bundler", "~> 1.3"
+  spec.add_development_dependency "rake"
+  # dependencies
+  spec.add_dependency 'nokogiri'
+  spec.add_dependency 'json'
+  spec.add_development_dependency "test-unit"
+end

metadata ADDED Viewed

@@ -0,0 +1,165 @@
+--- !ruby/object:Gem::Specification
+name: wiki-api
+version: !ruby/object:Gem::Version
+  prerelease:
+  version: 0.0.1
+platform: ruby
+authors:
+- Dennis Blommesteijn
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2013-03-28 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bundler
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.3'
+    none: false
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.3'
+    none: false
+  prerelease: false
+  type: :development
+- !ruby/object:Gem::Dependency
+  name: rake
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  prerelease: false
+  type: :development
+- !ruby/object:Gem::Dependency
+  name: nokogiri
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  prerelease: false
+  type: :runtime
+- !ruby/object:Gem::Dependency
+  name: json
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  prerelease: false
+  type: :runtime
+- !ruby/object:Gem::Dependency
+  name: test-unit
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: !binary |-
+          MA==
+    none: false
+  prerelease: false
+  type: :development
+description: MediaWiki API and parser
+email:
+- dennis@blommesteijn.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- lib/wiki/api.rb
+- lib/wiki/api/connect.rb
+- lib/wiki/api/page.rb
+- lib/wiki/api/page_block.rb
+- lib/wiki/api/page_headline.rb
+- lib/wiki/api/page_link.rb
+- lib/wiki/api/page_list_item.rb
+- lib/wiki/api/util.rb
+- lib/wiki/api/version.rb
+- test/test_helper.rb
+- test/unit/files/Wiktionary_Welcome,_newcomers.html
+- test/unit/wiki_connect.rb
+- test/unit/wiki_page_object.rb
+- wiki-api.gemspec
+homepage: ''
+licenses:
+- MIT
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      segments:
+      - 0
+      hash: 2
+      version: !binary |-
+        MA==
+  none: false
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      segments:
+      - 0
+      hash: 2
+      version: !binary |-
+        MA==
+  none: false
+requirements: []
+rubyforge_project:
+rubygems_version: 1.8.24
+signing_key:
+specification_version: 3
+summary: MediaWiki API and parser
+test_files:
+- test/test_helper.rb
+- test/unit/files/Wiktionary_Welcome,_newcomers.html
+- test/unit/wiki_connect.rb
+- test/unit/wiki_page_object.rb