RubyGems - wiki-api - Versions diffs - 0.1.0 → 0.1.2 - Mend

wiki-api 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

checksums.yaml +5 -13
data/.rubocop.yml +24 -0
data/.travis.yml +12 -0
data/Gemfile +2 -0
data/README.md +60 -62
data/Rakefile +13 -1
data/bin/console +8 -0
data/lib/wiki/api/connect.rb +48 -38
data/lib/wiki/api/page.rb +35 -42
data/lib/wiki/api/page_block.rb +16 -17
data/lib/wiki/api/page_headline.rb +51 -50
data/lib/wiki/api/page_link.rb +13 -14
data/lib/wiki/api/page_list_item.rb +10 -13
data/lib/wiki/api/util.rb +18 -20
data/lib/wiki/api/version.rb +3 -1
data/lib/wiki/api.rb +9 -8
data/test/test_helper.rb +4 -7
data/test/unit/wiki_connect.rb +18 -25
data/test/unit/wiki_page_offline.rb +144 -111
data/wiki-api.gemspec +20 -17
metadata +53 -34

checksums.yaml CHANGED Viewed

@@ -1,15 +1,7 @@
 ---
-SHA1:
-  metadata.gz: !binary |-
-    NjQ3MjZkMDdmNTg2YjdhZDRmM2E3MjU4ZjA1Y2IwOGYzODEwZTFkMA==
-  data.tar.gz: !binary |-
-    YWE4Mzc4ZjRlYTBjNGE4MTkyYmE0OGFkOTJkMDViZTI0MjQ5MGFiMw==
+SHA256:
+  metadata.gz: cd978cd4dad89ddc8098d6abafcd6325ec6c0c4a4a5e5b8e93855bc118314b27
+  data.tar.gz: c5ead46deb2d10310823d4b639046058cf087a29cb6a0413a5e3addc64037b92
 SHA512:
-  metadata.gz: !binary |-
-    OTNhMTZkNjMwNzJiMzU5YWE0ZDZiNzRlZWU5ZDJjM2Q1NTA5ZWRiN2IzY2Mw
-    MmU1ZDk0ODZhN2U4ODYwNjY0ZjdmY2U5ZTFkMDk4ZDA2MzIyODUzNjE0YzVl
-    OGE2ZmFmOTYyOWY2MWIyNGNlNmU5NjYwOTNkMGNhNjllOWM0YzQ=
-  data.tar.gz: !binary |-
-    YjgzZGEzYzhhOWFmNzZhMjRlMWFiYmJiY2Q3N2EwOGQwZTBjY2Q0NzYxNWE2
-    ODc5NmMyNmYyODMyNmVmMjFmYzhhOTAzMTUzZTBmODU2OTMwY2RhYjg0Mjkz
-    Yjk3NjMzNGFlZGViYzQyOGQ5YzVjM2MzMjIyNWVlOWRhOTU0MDk=
+  metadata.gz: fcb6e3991c12a415a79b4c109091a41dbe45bff7ee3040a1a4283ddc2625522cfca767c65cba45e0f29bb13d410f082b78337de25d0bfd2bd9e0bd1591a36c24
+  data.tar.gz: 3a78fa474766c4cc10c44eb3e8a90ed95c1ddac1f306afa878da2ccf7b75e4fd179fc7933499f261c408cdd2f396d3613a6d74361bdad160cb3c13727aaa135c

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,24 @@
+AllCops:
+  SuggestExtensions: false
+Style/ClassVars:
+  Enabled: false
+Style/Documentation:
+  Enabled: false
+Style/MethodCallWithArgsParentheses:
+  Enabled: true
+Metrics/AbcSize:
+  Enabled: false
+Metrics/ClassLength:
+  Enabled: false
+Metrics/CyclomaticComplexity:
+  Enabled: false
+Metrics/PerceivedComplexity:
+  Enabled: false
+Metrics/MethodLength:
+  Enabled: false
+Naming/MethodParameterName:
+  Enabled: false
+Naming/PredicateName:
+  Enabled: false
+Lint/RescueException:
+  Enabled: false

data/.travis.yml ADDED Viewed

@@ -0,0 +1,12 @@
+language: ruby
+rvm:
+  - 1.9.3
+  - 2.1.0
+  - jruby-19mode
+  - ruby-head
+  - jruby-head
+jdk:
+  - oraclejdk7
+before_install:
+  - gem update --system
+  - gem --version

data/Gemfile CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 source 'https://rubygems.org'
 # Specify your gem's dependencies in wiki-api.gemspec

data/README.md CHANGED Viewed

@@ -1,47 +1,20 @@
 # Wiki::Api
-Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes for Page and Headline parsing. You're able to iterate through these headlines, and access data accordingly.
+[![Build Status](https://travis-ci.org/dblommesteijn/wiki-api.svg?branch=master)](https://travis-ci.org/dblommesteijn/wiki-api) [![Code Climate](https://codeclimate.com/github/dblommesteijn/wiki-api.png)](https://codeclimate.com/github/dblommesteijn/wiki-api)
-NOTE: This gem has a nokogiri (http://nokogiri.org/Nokogiri.html) backend (for HTML parsing). Major components: Page, Headline, Block, ListItem, and Link are wrappers for easy data access, however it's still possible to retreive the raw HTML within these objects.
+Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes for Page and Headline parsing. You're able to iterate through these headlines, and access data accordingly.
+NOTE: This gem has a nokogiri (http://nokogiri.org/Nokogiri.html) backend (for HTML parsing). Major components: `Page`, `Headline`, `Block`, `ListItem`, and `Link` are wrappers for easy data access, however it's still possible to retreive the raw HTML within these objects.
 Requests to the MediaWiki API use the following URI structure:
     http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
-# RDoc (rdoc.info)
-    http://rdoc.info/github/dblommesteijn/wiki-api/frames/file/README.md
+### Dependencies
-### Dependencies (production)
-* json
 * nokogiri
-### Feature Roadmap
-* Version (0.1.0)
-  Major current release with several core changes.
-* Version (0.1.1)
-  No features determined yet (please drop me a line if you're interested in additions).
-### Changelog
-* Version (0.0.2) -> (current)
-  PageLink URI without global config Exception resolved
-  Reverse (parent) object lookup
-  Nested PageHeadline objects
 ## Installation
 Add this line to your application's Gemfile (bundler):
@@ -56,23 +29,29 @@ Or install it yourself (RubyGems):
     $ gem install wiki-api
+Or try it from this repository (local) in a console:
+    $ bin/console
 ## Setup
 Define a configuration for your connection (initialize script), this example uses wiktionary.org.
-NOTE: it can connect to both HTTP and HTTPS MediaWikis.
-```ruby
-CONFIG = { uri: "http://en.wiktionary.org" }
-```
+NOTE: it can connect to both HTTP and HTTPS MediaWikis (however you'll get a 302 response from MediaWiki)
 Setup default configuration (initialize script)
 ```ruby
-Wiki::Api::Connect.config = CONFIG
+Wiki::Api::Connect.config = { uri: 'https://en.wiktionary.org' }
 ```
+## Running tests
+```bash
+$ rake test
+```
 ## Usage
 ### Query a Page and Headline
@@ -80,7 +59,7 @@ Wiki::Api::Connect.config = CONFIG
 Requesting headlines from a given page.
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
 # the root headline equals the pagename
 puts page.root_headline.name
 # iterate next level of headlines
@@ -93,9 +72,9 @@ end
 Getting headlines for a given name.
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
 # lookup headline by name (underscore and case are ignored)
-headline = page.root_headline.headline("editing wiktionary").first
+headline = page.root_headline.headline('editing wiktionary').first
 # printing headline name (PageHeadline)
 puts headline.name
 # get the type of nested headline (html h1,2,3,4 etc.)
@@ -105,7 +84,7 @@ puts headline.type
 ### Basic Page structure
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
 # iterate PageHeadline objects
 page.root_headline.headlines.each do |headline_name, headline|
   # exposing nokogiri internal elements
@@ -114,6 +93,7 @@ page.root_headline.headlines.each do |headline_name, headline|
     # print will result in: Nokogiri::XML::Text or Nokogiri::XML::Element
     puts element.class
   end
   # string representation of all nested text
   block.to_texts
   # iterate PageListItem objects
@@ -137,7 +117,6 @@ page.root_headline.headlines.each do |headline_name, headline|
     # string representation of nested text
     link.to_text
   end
 end
 ```
@@ -148,21 +127,20 @@ This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and pr
 ```ruby
 # setting a target config
-CONFIG = { uri: "https://en.wikipedia.org" }
-Wiki::Api::Connect.config = CONFIG
+Wiki::Api::Connect.config = { uri: 'https://en.wikipedia.org' }
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_Rails"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails')
 # get headlines with name Reference (there can be multiple headlines with the same name!)
-headlines = page.root_headline.headline "References"
+headlines = page.root_headline.headline('References')
 # iterate headlines
 headlines.each do |headline|
   # iterate list items on the given headline
   headline.block.list_items.each do |list_item|
     # print the uri of all links
-    puts list_item.links.map{ |l| l.uri }
+    puts list_item.links.map(&:uri)
   end
 end
 ```
@@ -174,19 +152,17 @@ This is the same example as the one above, except for setting a global config to
 ```ruby
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_Rails", uri: "https://en.wikipedia.org"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
 # get headlines with name Reference (there can be multiple headlines with the same name!)
-headlines = page.root_headline.headline "References"
+headlines = page.root_headline.headline('References')
 # iterate headlines
 headlines.each do |headline|
   # iterate list items on the given headline
   headline.block.list_items.each do |list_item|
     # print the uri of all links
-    puts list_item.links.map{ |l| l.uri }
+    puts list_item.links.map(&:uri)
   end
 end
 ```
@@ -199,25 +175,47 @@ This example shows how the headlines can be searched. For more info check: https
 ```ruby
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_Rails", uri: "https://en.wikipedia.org"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
 # NOTE: the following are all valid headline names:
 # request headline (by literal name)
-headlines = page.root_headline.headline "Philosophy_and_design"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('Philosophy_and_design')
+puts headlines.map(&:name)
 # request headline (by downcase name)
-headlines = page.root_headline.headline "philosophy_and_design"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('philosophy_and_design')
+puts headlines.map(&:name)
 # request headline (by human name)
-headlines = page.root_headline.headline "philosophy and design"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('philosophy and design')
+puts headlines.map(&:name)
 # NOTE2: headlines are matched on headline.start_with?(requested_headline)
 # because of start_with? compare this should work as well!
-headlines = page.root_headline.headline "philosophy"
-puts headlines.map{|h| h.name}
+headlines = page.root_headline.headline('philosophy')
+puts headlines.map(&:name)
 ```
+### Example searching headlines in depth
+Recursive search on all nested headlines, including in depth searches.
+```ruby
+# querying the page
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
+# get root
+root_headline = page.root_headline
+# lookup 'ramework structure' on current level
+headline = root_headline.headline_in_depth('framework structure').first
+puts headline.name
+# NOTE: lookup of nested headlines does not work with the headline function (because 'Framework_structure' is nested within 'Technical_overview')
+headline = root_headline.headline('framework structure').first
+# depth can be limited adding the depth parameter
+# NOTE: the example below will return nil, 'Framework_structure' is nested beyond depth = 0!
+depth = 0
+headline = root_headline.headline_in_depth('framework structure', depth).first
+# increasing depth search will show the requested headline
+depth = 5
+headline = root_headline.headline_in_depth('framework structure', depth).first
+puts headline.name
+```

data/Rakefile CHANGED Viewed

@@ -1 +1,13 @@
-require "bundler/gem_tasks"
+# frozen_string_literal: true
+require 'bundler/gem_tasks'
+require 'rake/testtask'
+Rake::TestTask.new do |t|
+  t.libs << 'test'
+  tfs = FileList['test/unit/*.rb']
+  t.test_files = tfs
+  t.verbose = true
+end
+task default: %i[build install]

data/bin/console ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require 'bundler/setup'
+require 'wiki/api'
+require 'pry'
+Pry.start

data/lib/wiki/api/connect.rb CHANGED Viewed

@@ -1,85 +1,95 @@
+# frozen_string_literal: true
 require 'net/http'
 require 'json'
 require 'nokogiri'
 module Wiki
   module Api
     class Connect
       attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed, :file
-      def initialize(options={})
-        @@config ||= nil
-        options.merge! @@config unless @@config.nil?
-        self.uri = options[:uri] if options.include? :uri
-        self.file = options[:file] if options.include? :file
-        self.api_path = options[:api_path] if options.include? :api_path
-        self.api_options = options[:api_options] if options.include? :api_options
+      def initialize(options = {})
+        @@config ||= {}
+        self.uri = options[:uri] || @@config[:uri]
+        self.file = options[:file] || @@config[:file]
+        self.api_path = options[:api_path] || @@config[:api_path]
+        self.api_options = options[:api_options] || @@config[:api_options]
         # defaults
-        self.api_path ||= "/w/api.php"
-        self.api_options ||= {action: "parse", format: "json", page: ""}
+        self.api_path ||= '/w/api.php'
+        self.api_options ||= { action: 'parse', format: 'json', page: '' }
         # errors
-        raise "no uri given" if self.uri.nil?
+        raise('no uri given') if uri.nil?
       end
       def connect
         uri = URI("#{self.uri}#{self.api_path}")
-        uri.query = URI.encode_www_form self.api_options
+        uri.query = URI.encode_www_form(self.api_options)
         self.http = Net::HTTP.new(uri.host, uri.port)
-        if uri.scheme == "https"
-          self.http.use_ssl = true
-          #self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
+        if uri.scheme == 'https'
+          http.use_ssl = true
+          # self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
         end
         self.request = Net::HTTP::Get.new(uri.request_uri)
-        self.response = self.http.request(request)
+        self.response = http.request(request)
       end
-      def page page_name
+      def page(page_name)
         self.api_options[:page] = page_name
         # parse page by uri
-        if !self.uri.nil? && self.file.nil?
-          self.connect
-          response = self.response
-          json = JSON.parse response.body, {symbolize_names: true}
-          raise json[:error][:code] unless valid? json, response
-          self.html = json[:parse][:text]
-          self.parsed = Nokogiri::HTML self.html[:*]
+        if !uri.nil? && file.nil?
+          self.parsed = parse_from_uri(response)
         # parse page by file
-        elsif !self.file.nil?
-          f = File.open(self.file)
-          # self.parsed = Nokogiri::HTML self.html[:*]
-          self.parsed = Nokogiri::HTML(f)
-          f.close
+        elsif !file.nil?
+          self.parsed = parse_from_file(file)
         # invalid config, raise exception
         else
-          raise "no :uri or :file config found!"
+          raise('no :uri or :file config found!')
         end
-        self.parsed
+        parsed
+      end
+      def parse_from_uri(response)
+        connect
+        # rubocop:disable Lint/ShadowedArgument
+        response = self.response
+        # rubocop:enable Lint/ShadowedArgument
+        json = JSON.parse(response.body, { symbolize_names: true })
+        raise(json[:error][:code]) unless valid?(json, response)
+        self.html = json[:parse][:text]
+        self.parsed = Nokogiri::HTML(html[:*])
+      end
+      def parse_from_file(file)
+        f = File.open(file)
+        ret = Nokogiri::HTML(f)
+        f.close
+        ret
       end
       class << self
         def config=(config = {})
           @@config = config
         end
         def config
           @@config ||= []
         end
       end
       protected
-      def valid? json, response
+      def valid?(json, response)
         b = []
         # valid http response
-        b << (response.is_a? Net::HTTPOK)
+        b << (response.is_a?(Net::HTTPOK))
         # not an invalid api response handle
-        b << (!json.include? :error)
+        b << (!json.include?(:error))
         !b.include?(false)
       end
     end
   end
-end
+end

data/lib/wiki/api/page.rb CHANGED Viewed

@@ -1,25 +1,22 @@
+# frozen_string_literal: true
 module Wiki
   module Api
     # MediaWiki Page, collection of all html information plus it's page title
     class Page
       attr_accessor :name, :parsed_page, :uri, :parent
-      def initialize(options={})
-        self.name = options[:name] if options.include? :name
-        self.uri = options[:uri] if options.include? :uri
-        @connect = Wiki::Api::Connect.new uri: uri
-      end
-      def connect
-        @connect
+      def initialize(options = {})
+        self.name = options[:name] if options.include?(:name)
+        self.uri = options[:uri] if options.include?(:uri)
+        @connect = Wiki::Api::Connect.new(uri:)
       end
+      attr_reader :connect
       # collect all headlines, keep original page formatting
       def root_headline
-        self.parse_blocks
+        parse_blocks
       end
       # # collect headlines by given name, this will flatten the nested headlines
@@ -30,10 +27,9 @@ module Wiki
       #   self.parse_blocks(headline_name)
       # end
       def to_html
-        self.load_page!
-        self.parsed_page.to_xhtml indent: 3, indent_text: " "
+        load_page!
+        parsed_page.to_xhtml(indent: 3, indent_text: ' ')
       end
       def reset!
@@ -41,69 +37,66 @@ module Wiki
       end
       def load_page!
-        self.parsed_page ||= @connect.page self.name
+        self.parsed_page ||= @connect.page(name)
       end
       # parse blocks
-      def parse_blocks headline_name = nil
-        self.load_page!
+      def parse_blocks(headline_name = nil)
+        load_page!
         result = {}
         # get headline nodes by span class
-        xs = self.parsed_page.xpath("//span[@class='mw-headline']")
+        headlines = self.parsed_page.xpath("//span[@class='mw-headline']")
         # filter single headline by name (ignore case)
-        xs = self.filter_headline xs, headline_name unless headline_name.nil?
+        headlines = filter_headline(headlines, headline_name) unless headline_name.nil?
         # NOTE: first_part has no id attribute and thus cannot be filtered or processed within xpath (xs)
-        if headline_name.nil? || headline_name.start_with?(self.name.downcase)
-          x = self.first_part
-          result[self.name] ||= []
-          result[self.name] << (self.collect_elements(x.parent))
+        if headline_name.nil? || headline_name.start_with?(name.downcase)
+          x = first_part
+          result[name] ||= []
+          result[name] << (collect_elements(x.parent))
         end
         # append all blocks
-        xs.each do |x|
-          headline = x.attributes["id"].value
-          elements = self.collect_elements x.parent.next
-          result[headline] ||= []
-          result[headline] << elements
+        headlines.each do |headline|
+          headline_value = headline.attributes['id'].value
+          elements = collect_elements(headline.parent.next)
+          result[headline_value] ||= []
+          result[headline_value] << elements
         end
         # create root object
-        PageHeadline.new parent: self, name: result.first[0], headlines: result, level: 0
+        PageHeadline.new(parent: self, name: result.first[0], headlines: result, level: 0)
       end
       # harvest first part of the page (missing heading and class="mw-headline")
       def first_part
-        self.parsed_page ||= @connect.page self.name
-        self.parsed_page.search("p").first.children.first
+        self.parsed_page ||= @connect.page(name)
+        self.parsed_page.search('p').first.children.first
       end
       # collect elements within headlines (not nested properties, but next elements)
-      def collect_elements element
+      def collect_elements(element)
         # capture first element name
         elements = []
         # iterate text until next headline
-        while true do
+        loop do
           elements << element
           element = element.next
-          break if element.nil? || element.to_html.include?("class=\"mw-headline\"")
+          break if element.nil? || element.to_html.include?('class="mw-headline"')
         end
         elements
       end
-      def filter_headline xs, headline_name
+      def filter_headline(xs, headline_name)
         # transform name to a wiki_id (downcase and space replace with underscore)
-        headline_name = headline_name.downcase.gsub(" ", "_")
+        headline_name = headline_name.downcase.gsub(' ', '_')
         # reject not matching id's
-        xs.reject do |t|
-          !t.attributes["id"].value.downcase.start_with?(headline_name)
+        xs.select do |t|
+          t.attributes['id'].value.downcase.start_with?(headline_name)
         end
       end
     end
   end
-end
+end