RubyGems - wiki-api - Versions diffs - 0.1.0 → 0.1.2 - Mend

wiki-api 0.1.0 → 0.1.2

Files changed (21) hide show

checksums.yaml +5 -13
data/.rubocop.yml +24 -0
data/.travis.yml +12 -0
data/Gemfile +2 -0
data/README.md +60 -62
data/Rakefile +13 -1
data/bin/console +8 -0
data/lib/wiki/api/connect.rb +48 -38
data/lib/wiki/api/page.rb +35 -42
data/lib/wiki/api/page_block.rb +16 -17
data/lib/wiki/api/page_headline.rb +51 -50
data/lib/wiki/api/page_link.rb +13 -14
data/lib/wiki/api/page_list_item.rb +10 -13
data/lib/wiki/api/util.rb +18 -20
data/lib/wiki/api/version.rb +3 -1
data/lib/wiki/api.rb +9 -8
data/test/test_helper.rb +4 -7
data/test/unit/wiki_connect.rb +18 -25
data/test/unit/wiki_page_offline.rb +144 -111
data/wiki-api.gemspec +20 -17
metadata +53 -34

checksums.yaml CHANGED Viewed

@@ -1,15 +1,7 @@
 ---
-SHA1:
-  metadata.gz: !binary |-
-    NjQ3MjZkMDdmNTg2YjdhZDRmM2E3MjU4ZjA1Y2IwOGYzODEwZTFkMA==
-  data.tar.gz: !binary |-
-    YWE4Mzc4ZjRlYTBjNGE4MTkyYmE0OGFkOTJkMDViZTI0MjQ5MGFiMw==
+SHA256:
+  metadata.gz: cd978cd4dad89ddc8098d6abafcd6325ec6c0c4a4a5e5b8e93855bc118314b27
+  data.tar.gz: c5ead46deb2d10310823d4b639046058cf087a29cb6a0413a5e3addc64037b92
 SHA512:
-  metadata.gz: !binary |-
-    OTNhMTZkNjMwNzJiMzU5YWE0ZDZiNzRlZWU5ZDJjM2Q1NTA5ZWRiN2IzY2Mw
-    MmU1ZDk0ODZhN2U4ODYwNjY0ZjdmY2U5ZTFkMDk4ZDA2MzIyODUzNjE0YzVl
-    OGE2ZmFmOTYyOWY2MWIyNGNlNmU5NjYwOTNkMGNhNjllOWM0YzQ=
-  data.tar.gz: !binary |-
-    YjgzZGEzYzhhOWFmNzZhMjRlMWFiYmJiY2Q3N2EwOGQwZTBjY2Q0NzYxNWE2
-    ODc5NmMyNmYyODMyNmVmMjFmYzhhOTAzMTUzZTBmODU2OTMwY2RhYjg0Mjkz
-    Yjk3NjMzNGFlZGViYzQyOGQ5YzVjM2MzMjIyNWVlOWRhOTU0MDk=
+  metadata.gz: fcb6e3991c12a415a79b4c109091a41dbe45bff7ee3040a1a4283ddc2625522cfca767c65cba45e0f29bb13d410f082b78337de25d0bfd2bd9e0bd1591a36c24
+  data.tar.gz: 3a78fa474766c4cc10c44eb3e8a90ed95c1ddac1f306afa878da2ccf7b75e4fd179fc7933499f261c408cdd2f396d3613a6d74361bdad160cb3c13727aaa135c

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,24 @@
+AllCops:
+  SuggestExtensions: false
+Style/ClassVars:
+  Enabled: false
+Style/Documentation:
+  Enabled: false
+Style/MethodCallWithArgsParentheses:
+  Enabled: true
+Metrics/AbcSize:
+  Enabled: false
+Metrics/ClassLength:
+  Enabled: false
+Metrics/CyclomaticComplexity:
+  Enabled: false
+Metrics/PerceivedComplexity:
+  Enabled: false
+Metrics/MethodLength:
+  Enabled: false
+Naming/MethodParameterName:
+  Enabled: false
+Naming/PredicateName:
+  Enabled: false
+Lint/RescueException:
+  Enabled: false

data/.travis.yml ADDED Viewed

@@ -0,0 +1,12 @@
+language: ruby
+rvm:
+  - 1.9.3
+  - 2.1.0
+  - jruby-19mode
+  - ruby-head
+  - jruby-head
+jdk:
+  - oraclejdk7
+before_install:
+  - gem update --system
+  - gem --version

data/Gemfile CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 source 'https://rubygems.org'
 # Specify your gem's dependencies in wiki-api.gemspec

data/README.md CHANGED Viewed

@@ -1,47 +1,20 @@
 # Wiki::Api
-Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes for Page and Headline parsing. You're able to iterate through these headlines, and access data accordingly.
+[![Build Status](https://travis-ci.org/dblommesteijn/wiki-api.svg?branch=master)](https://travis-ci.org/dblommesteijn/wiki-api) [![Code Climate](https://codeclimate.com/github/dblommesteijn/wiki-api.png)](https://codeclimate.com/github/dblommesteijn/wiki-api)
-NOTE: This gem has a nokogiri (http://nokogiri.org/Nokogiri.html) backend (for HTML parsing). Major components: Page, Headline, Block, ListItem, and Link are wrappers for easy data access, however it's still possible to retreive the raw HTML within these objects.
+Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes for Page and Headline parsing. You're able to iterate through these headlines, and access data accordingly.
+NOTE: This gem has a nokogiri (http://nokogiri.org/Nokogiri.html) backend (for HTML parsing). Major components: `Page`, `Headline`, `Block`, `ListItem`, and `Link` are wrappers for easy data access, however it's still possible to retreive the raw HTML within these objects.
 Requests to the MediaWiki API use the following URI structure:
     http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
-# RDoc (rdoc.info)
-    http://rdoc.info/github/dblommesteijn/wiki-api/frames/file/README.md
+### Dependencies
-### Dependencies (production)
-* json
 * nokogiri
-### Feature Roadmap
-* Version (0.1.0)
-  Major current release with several core changes.
-* Version (0.1.1)
-  No features determined yet (please drop me a line if you're interested in additions).
-### Changelog
-* Version (0.0.2) -> (current)
-  PageLink URI without global config Exception resolved
-  Reverse (parent) object lookup
-  Nested PageHeadline objects
 ## Installation
 Add this line to your application's Gemfile (bundler):
@@ -56,23 +29,29 @@ Or install it yourself (RubyGems):
     $ gem install wiki-api
+Or try it from this repository (local) in a console:
+    $ bin/console
 ## Setup
 Define a configuration for your connection (initialize script), this example uses wiktionary.org.
-NOTE: it can connect to both HTTP and HTTPS MediaWikis.
-```ruby
-CONFIG = { uri: "http://en.wiktionary.org" }
-```
+NOTE: it can connect to both HTTP and HTTPS MediaWikis (however you'll get a 302 response from MediaWiki)
 Setup default configuration (initialize script)
 ```ruby
-Wiki::Api::Connect.config = CONFIG
+Wiki::Api::Connect.config = { uri: 'https://en.wiktionary.org' }
 ```
+## Running tests
+```bash
+$ rake test
+```
 ## Usage
 ### Query a Page and Headline
@@ -80,7 +59,7 @@ Wiki::Api::Connect.config = CONFIG
 Requesting headlines from a given page.
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
 # the root headline equals the pagename
 puts page.root_headline.name
 # iterate next level of headlines
@@ -93,9 +72,9 @@ end
 Getting headlines for a given name.
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
 # lookup headline by name (underscore and case are ignored)
-headline = page.root_headline.headline("editing wiktionary").first
+headline = page.root_headline.headline('editing wiktionary').first
 # printing headline name (PageHeadline)
 puts headline.name
 # get the type of nested headline (html h1,2,3,4 etc.)
@@ -105,7 +84,7 @@ puts headline.type
 ### Basic Page structure
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
 # iterate PageHeadline objects
 page.root_headline.headlines.each do |headline_name, headline|
   # exposing nokogiri internal elements
@@ -114,6 +93,7 @@ page.root_headline.headlines.each do |headline_name, headline|
     # print will result in: Nokogiri::XML::Text or Nokogiri::XML::Element
     puts element.class
   end
   # string representation of all nested text
   block.to_texts
   # iterate PageListItem objects
@@ -137,7 +117,6 @@ page.root_headline.headlines.each do |headline_name, headline|
     # string representation of nested text
     link.to_text
   end
 end
 ```
@@ -148,21 +127,20 @@ This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and pr
 ```ruby
 # setting a target config
-CONFIG = { uri: "https://en.wikipedia.org" }
-Wiki::Api::Connect.config = CONFIG
+Wiki::Api::Connect.config = { uri: 'https://en.wikipedia.org' }
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_Rails"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails')
 # get headlines with name Reference (there can be multiple headlines with the same name!)
-headlines = page.root_headline.headline "References"
+headlines = page.root_headline.headline('References')
 # iterate headlines
 headlines.each do |headline|
   # iterate list items on the given headline
   headline.block.list_items.each do |list_item|
     # print the uri of all links
-    puts list_item.links.map{ |l| l.uri }
+    puts list_item.links.map(&:uri)
   end
 end
 ```
@@ -174,19 +152,17 @@ This is the same example as the one above, except for setting a global config to
 ```ruby
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_Rails", uri: "https://en.wikipedia.org"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
 # get headlines with name Reference (there can be multiple headlines with the same name!)
-headlines = page.root_headline.headline "References"
+headlines = page.root_headline.headline('References')
 # iterate headlines
 headlines.each do |headline|
   # iterate list items on the given headline
   headline.block.list_items.each do |list_item|
     # print the uri of all links
-    puts list_item.links.map{ |l| l.uri }
+    puts list_item.links.map(&:uri)
   end
 end
 ```
@@ -199,25 +175,47 @@ This example shows how the headlines can be searched. For more info check: https
 ```ruby
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_Rails", uri: "https://en.wikipedia.org"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
 # NOTE: the following are all valid headline names:
 # request headline (by literal name)
-headlines = page.root_headline.headline "Philosophy_and_design"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('Philosophy_and_design')
+puts headlines.map(&:name)
 # request headline (by downcase name)
-headlines = page.root_headline.headline "philosophy_and_design"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('philosophy_and_design')
+puts headlines.map(&:name)
 # request headline (by human name)
-headlines = page.root_headline.headline "philosophy and design"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('philosophy and design')
+puts headlines.map(&:name)
 # NOTE2: headlines are matched on headline.start_with?(requested_headline)
 # because of start_with? compare this should work as well!
-headlines = page.root_headline.headline "philosophy"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('philosophy')
+puts headlines.map(&:name)
 ```
+### Example searching headlines in depth
+Recursive search on all nested headlines, including in depth searches.
+```ruby
+# querying the page
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
+# get root
+root_headline = page.root_headline
+# lookup 'ramework structure' on current level
+headline = root_headline.headline_in_depth('framework structure').first
+puts headline.name
+# NOTE: lookup of nested headlines does not work with the headline function (because 'Framework_structure' is nested within 'Technical_overview')
+headline = root_headline.headline('framework structure').first
+# depth can be limited adding the depth parameter
+# NOTE: the example below will return nil, 'Framework_structure' is nested beyond depth = 0!
+depth = 0
+headline = root_headline.headline_in_depth('framework structure', depth).first
+# increasing depth search will show the requested headline
+depth = 5
+headline = root_headline.headline_in_depth('framework structure', depth).first
+puts headline.name
+```

data/Rakefile CHANGED Viewed

@@ -1 +1,13 @@
-require "bundler/gem_tasks"
+# frozen_string_literal: true
+require 'bundler/gem_tasks'
+require 'rake/testtask'
+Rake::TestTask.new do |t|
+  t.libs << 'test'
+  tfs = FileList['test/unit/*.rb']
+  t.test_files = tfs
+  t.verbose = true
+end
+task default: %i[build install]

data/bin/console ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require 'bundler/setup'
+require 'wiki/api'
+require 'pry'
+Pry.start

data/lib/wiki/api/connect.rb CHANGED Viewed

@@ -1,85 +1,95 @@
+# frozen_string_literal: true
 require 'net/http'
 require 'json'
 require 'nokogiri'
 module Wiki
   module Api
     class Connect
       attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed, :file
-      def initialize(options={})
-        @@config ||= nil
-        options.merge! @@config unless @@config.nil?
-        self.uri = options[:uri] if options.include? :uri
-        self.file = options[:file] if options.include? :file
-        self.api_path = options[:api_path] if options.include? :api_path
-        self.api_options = options[:api_options] if options.include? :api_options
+      def initialize(options = {})
+        @@config ||= {}
+        self.uri = options[:uri] || @@config[:uri]
+        self.file = options[:file] || @@config[:file]
+        self.api_path = options[:api_path] || @@config[:api_path]
+        self.api_options = options[:api_options] || @@config[:api_options]
         # defaults
-        self.api_path ||= "/w/api.php"
-        self.api_options ||= {action: "parse", format: "json", page: ""}
+        self.api_path ||= '/w/api.php'
+        self.api_options ||= { action: 'parse', format: 'json', page: '' }
         # errors
-        raise "no uri given" if self.uri.nil?
+        raise('no uri given') if uri.nil?
       end
       def connect
         uri = URI("#{self.uri}#{self.api_path}")
-        uri.query = URI.encode_www_form self.api_options
+        uri.query = URI.encode_www_form(self.api_options)
         self.http = Net::HTTP.new(uri.host, uri.port)
-        if uri.scheme == "https"
-          self.http.use_ssl = true
-          #self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
+        if uri.scheme == 'https'
+          http.use_ssl = true
+          # self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
         end
         self.request = Net::HTTP::Get.new(uri.request_uri)
-        self.response = self.http.request(request)
+        self.response = http.request(request)
       end
-      def page page_name
+      def page(page_name)
         self.api_options[:page] = page_name
         # parse page by uri
-        if !self.uri.nil? && self.file.nil?
-          self.connect
-          response = self.response
-          json = JSON.parse response.body, {symbolize_names: true}
-          raise json[:error][:code] unless valid? json, response
-          self.html = json[:parse][:text]
-          self.parsed = Nokogiri::HTML self.html[:*]
+        if !uri.nil? && file.nil?
+          self.parsed = parse_from_uri(response)
         # parse page by file
-        elsif !self.file.nil?
-          f = File.open(self.file)
-          # self.parsed = Nokogiri::HTML self.html[:*]
-          self.parsed = Nokogiri::HTML(f)
-          f.close
+        elsif !file.nil?
+          self.parsed = parse_from_file(file)
         # invalid config, raise exception
         else
-          raise "no :uri or :file config found!"
+          raise('no :uri or :file config found!')
         end
-        self.parsed
+        parsed
+      end
+      def parse_from_uri(response)
+        connect
+        # rubocop:disable Lint/ShadowedArgument
+        response = self.response
+        # rubocop:enable Lint/ShadowedArgument
+        json = JSON.parse(response.body, { symbolize_names: true })
+        raise(json[:error][:code]) unless valid?(json, response)
+        self.html = json[:parse][:text]
+        self.parsed = Nokogiri::HTML(html[:*])
+      end
+      def parse_from_file(file)
+        f = File.open(file)
+        ret = Nokogiri::HTML(f)
+        f.close
+        ret
       end
       class << self
         def config=(config = {})
           @@config = config
         end
         def config
           @@config ||= []
         end
       end
       protected
-      def valid? json, response
+      def valid?(json, response)
         b = []
         # valid http response
-        b << (response.is_a? Net::HTTPOK)
+        b << (response.is_a?(Net::HTTPOK))
         # not an invalid api response handle
-        b << (!json.include? :error)
+        b << (!json.include?(:error))
         !b.include?(false)
       end
     end
   end
-end
+end

data/lib/wiki/api/page.rb CHANGED Viewed

@@ -1,25 +1,22 @@
+# frozen_string_literal: true
 module Wiki
   module Api
     # MediaWiki Page, collection of all html information plus it's page title
     class Page
       attr_accessor :name, :parsed_page, :uri, :parent
-      def initialize(options={})
-        self.name = options[:name] if options.include? :name
-        self.uri = options[:uri] if options.include? :uri
-        @connect = Wiki::Api::Connect.new uri: uri
-      end
-      def connect
-        @connect
+      def initialize(options = {})
+        self.name = options[:name] if options.include?(:name)
+        self.uri = options[:uri] if options.include?(:uri)
+        @connect = Wiki::Api::Connect.new(uri:)
       end
+      attr_reader :connect
       # collect all headlines, keep original page formatting
       def root_headline
-        self.parse_blocks
+        parse_blocks
       end
       # # collect headlines by given name, this will flatten the nested headlines
@@ -30,10 +27,9 @@ module Wiki
       #   self.parse_blocks(headline_name)
       # end
       def to_html
-        self.load_page!
-        self.parsed_page.to_xhtml indent: 3, indent_text: " "
+        load_page!
+        parsed_page.to_xhtml(indent: 3, indent_text: ' ')
       end
       def reset!
@@ -41,69 +37,66 @@ module Wiki
       end
       def load_page!
-        self.parsed_page ||= @connect.page self.name
+        self.parsed_page ||= @connect.page(name)
       end
       # parse blocks
-      def parse_blocks headline_name = nil
-        self.load_page!
+      def parse_blocks(headline_name = nil)
+        load_page!
         result = {}
         # get headline nodes by span class
-        xs = self.parsed_page.xpath("//span[@class='mw-headline']")
+        headlines = self.parsed_page.xpath("//span[@class='mw-headline']")
         # filter single headline by name (ignore case)
-        xs = self.filter_headline xs, headline_name unless headline_name.nil?
+        headlines = filter_headline(headlines, headline_name) unless headline_name.nil?
         # NOTE: first_part has no id attribute and thus cannot be filtered or processed within xpath (xs)
-        if headline_name.nil? || headline_name.start_with?(self.name.downcase)
-          x = self.first_part
-          result[self.name] ||= []
-          result[self.name] << (self.collect_elements(x.parent))
+        if headline_name.nil? || headline_name.start_with?(name.downcase)
+          x = first_part
+          result[name] ||= []
+          result[name] << (collect_elements(x.parent))
         end
         # append all blocks
-        xs.each do |x|
-          headline = x.attributes["id"].value
-          elements = self.collect_elements x.parent.next
-          result[headline] ||= []
-          result[headline] << elements
+        headlines.each do |headline|
+          headline_value = headline.attributes['id'].value
+          elements = collect_elements(headline.parent.next)
+          result[headline_value] ||= []
+          result[headline_value] << elements
         end
         # create root object
-        PageHeadline.new parent: self, name: result.first[0], headlines: result, level: 0
+        PageHeadline.new(parent: self, name: result.first[0], headlines: result, level: 0)
       end
       # harvest first part of the page (missing heading and class="mw-headline")
       def first_part
-        self.parsed_page ||= @connect.page self.name
-        self.parsed_page.search("p").first.children.first
+        self.parsed_page ||= @connect.page(name)
+        self.parsed_page.search('p').first.children.first
       end
       # collect elements within headlines (not nested properties, but next elements)
-      def collect_elements element
+      def collect_elements(element)
         # capture first element name
         elements = []
         # iterate text until next headline
-        while true do
+        loop do
           elements << element
           element = element.next
-          break if element.nil? || element.to_html.include?("class=\"mw-headline\"")
+          break if element.nil? || element.to_html.include?('class="mw-headline"')
         end
         elements
       end
-      def filter_headline xs, headline_name
+      def filter_headline(xs, headline_name)
         # transform name to a wiki_id (downcase and space replace with underscore)
-        headline_name = headline_name.downcase.gsub(" ", "_")
+        headline_name = headline_name.downcase.gsub(' ', '_')
         # reject not matching id's
-        xs.reject do |t|
-          !t.attributes["id"].value.downcase.start_with?(headline_name)
+        xs.select do |t|
+          t.attributes['id'].value.downcase.start_with?(headline_name)
         end
       end
     end
   end
-end
+end