RubyGems - wiki-api - Versions diffs - 0.0.2 → 0.1.2 - Mend

wiki-api 0.0.2 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

checksums.yaml +5 -13
data/.rubocop.yml +24 -0
data/.travis.yml +12 -0
data/Gemfile +2 -0
data/README.md +93 -64
data/Rakefile +13 -1
data/bin/console +8 -0
data/lib/wiki/api/connect.rb +52 -28
data/lib/wiki/api/page.rb +48 -82
data/lib/wiki/api/page_block.rb +19 -18
data/lib/wiki/api/page_headline.rb +104 -8
data/lib/wiki/api/page_link.rb +18 -14
data/lib/wiki/api/page_list_item.rb +12 -13
data/lib/wiki/api/util.rb +24 -15
data/lib/wiki/api/version.rb +3 -1
data/lib/wiki/api.rb +9 -8
data/test/test_helper.rb +4 -7
data/test/unit/files/Wiktionary_program.html +4232 -0
data/test/unit/wiki_connect.rb +18 -25
data/test/unit/wiki_page_offline.rb +295 -0
data/wiki-api.gemspec +20 -17
metadata +57 -38
data/test/unit/wiki_page_config.rb +0 -45
data/test/unit/wiki_page_object.rb +0 -229

checksums.yaml CHANGED Viewed

@@ -1,15 +1,7 @@
 ---
-SHA1:
-  metadata.gz: !binary |-
-    NjgxOGUxZjQ2MWQ2MjNhMDA2ZGUwMTRhOGI4MWFlOGQ3MzI4MWFjOA==
-  data.tar.gz: !binary |-
-    ZmZkNDFhMzc0ZTNmZDBlYTFmMTIwMmU5ZDgzYTQ2YjM0ZTk1ZmQzYg==
+SHA256:
+  metadata.gz: cd978cd4dad89ddc8098d6abafcd6325ec6c0c4a4a5e5b8e93855bc118314b27
+  data.tar.gz: c5ead46deb2d10310823d4b639046058cf087a29cb6a0413a5e3addc64037b92
 SHA512:
-  metadata.gz: !binary |-
-    NGM4YTU2MjQ3Njk1MzJkMDhlYjcxODYxNDFkNzRlODI5MjMwNmU5ZGEzZmJj
-    MjhjZjYxYzcxMmYzYjA0YzA3NzdlYTJhMjM0ZTllNzgyMDk0MGJiNjBiZWRl
-    N2Y5YzMwZWZjZmY3NWQ0YmJiMjdiOTkwOTU1ZmE4MDg5Njk4M2Y=
-  data.tar.gz: !binary |-
-    MGZlMTYzZTgzZWE3YmYzZmIyMjc0OTZhMGY0NDEwYzJmNmFiMTZkNDM3OGM2
-    Mjc1MDdjMzQ3MjM1NmVlODM3Mzg5ZTViMGRmOGI2NzE1NDZjODJhZTA2MjI5
-    NWE3YmI4MDYxY2I4NGM3MGUwNzAzNjQ3YjMwODU5NDBlMWYxZDM=
+  metadata.gz: fcb6e3991c12a415a79b4c109091a41dbe45bff7ee3040a1a4283ddc2625522cfca767c65cba45e0f29bb13d410f082b78337de25d0bfd2bd9e0bd1591a36c24
+  data.tar.gz: 3a78fa474766c4cc10c44eb3e8a90ed95c1ddac1f306afa878da2ccf7b75e4fd179fc7933499f261c408cdd2f396d3613a6d74361bdad160cb3c13727aaa135c

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,24 @@
+AllCops:
+  SuggestExtensions: false
+Style/ClassVars:
+  Enabled: false
+Style/Documentation:
+  Enabled: false
+Style/MethodCallWithArgsParentheses:
+  Enabled: true
+Metrics/AbcSize:
+  Enabled: false
+Metrics/ClassLength:
+  Enabled: false
+Metrics/CyclomaticComplexity:
+  Enabled: false
+Metrics/PerceivedComplexity:
+  Enabled: false
+Metrics/MethodLength:
+  Enabled: false
+Naming/MethodParameterName:
+  Enabled: false
+Naming/PredicateName:
+  Enabled: false
+Lint/RescueException:
+  Enabled: false

data/.travis.yml ADDED Viewed

@@ -0,0 +1,12 @@
+language: ruby
+rvm:
+  - 1.9.3
+  - 2.1.0
+  - jruby-19mode
+  - ruby-head
+  - jruby-head
+jdk:
+  - oraclejdk7
+before_install:
+  - gem update --system
+  - gem --version

data/Gemfile CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 source 'https://rubygems.org'
 # Specify your gem's dependencies in wiki-api.gemspec

data/README.md CHANGED Viewed

@@ -1,43 +1,20 @@
 # Wiki::Api
-Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes like: Page on which you can request page parameters (like headlines, and text blocks within headlines).
+[![Build Status](https://travis-ci.org/dblommesteijn/wiki-api.svg?branch=master)](https://travis-ci.org/dblommesteijn/wiki-api) [![Code Climate](https://codeclimate.com/github/dblommesteijn/wiki-api.png)](https://codeclimate.com/github/dblommesteijn/wiki-api)
-NOTE: nokogiri is used for background parsing of HTML. Because I believe there is no point of wrapping internals (composing) for this purpose, nokogiri nodes elements etc. are exposed (http://nokogiri.org/Nokogiri.html) through the wiki-api.
+Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes for Page and Headline parsing. You're able to iterate through these headlines, and access data accordingly.
+NOTE: This gem has a nokogiri (http://nokogiri.org/Nokogiri.html) backend (for HTML parsing). Major components: `Page`, `Headline`, `Block`, `ListItem`, and `Link` are wrappers for easy data access, however it's still possible to retreive the raw HTML within these objects.
 Requests to the MediaWiki API use the following URI structure:
     http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
+### Dependencies
-### Dependencies (production)
-* json
 * nokogiri
-### Roadmap
-* Version (0.0.2) (current)
-  Index important words per block, page, list item;
-  Parse objects for more elements within a Page.
-### Changelog
-* Version (0.0.1) -> (0.0.2)
-  Nested ListItems, Links (within Page)
-  Search on Page headline (ignore case, and underscore)
-### Known Issues
-None discovered thus far.
 ## Installation
 Add this line to your application's Gemfile (bundler):
@@ -52,32 +29,41 @@ Or install it yourself (RubyGems):
     $ gem install wiki-api
+Or try it from this repository (local) in a console:
+    $ bin/console
 ## Setup
 Define a configuration for your connection (initialize script), this example uses wiktionary.org.
-NOTE: it can connect to both HTTP and HTTPS MediaWikis.
-```ruby
-CONFIG = { uri: "http://en.wiktionary.org" }
-```
+NOTE: it can connect to both HTTP and HTTPS MediaWikis (however you'll get a 302 response from MediaWiki)
 Setup default configuration (initialize script)
 ```ruby
-Wiki::Api::Connect.config = CONFIG
+Wiki::Api::Connect.config = { uri: 'https://en.wiktionary.org' }
 ```
+## Running tests
+```bash
+$ rake test
+```
 ## Usage
-### Query a Page
+### Query a Page and Headline
 Requesting headlines from a given page.
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
-page.headlines.each do |headline|
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
+# the root headline equals the pagename
+puts page.root_headline.name
+# iterate next level of headlines
+page.root_headline.headlines.each do |headline_name, headline|
   # printing headline name (PageHeadline)
   puts headline.name
 end
@@ -86,30 +72,30 @@ end
 Getting headlines for a given name.
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
-page.headline("Wiktionary:Welcome,_newcomers").each do |headline|
-  # printing headline name (PageHeadline)
-  puts headline.name
-end
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
+# lookup headline by name (underscore and case are ignored)
+headline = page.root_headline.headline('editing wiktionary').first
+# printing headline name (PageHeadline)
+puts headline.name
+# get the type of nested headline (html h1,2,3,4 etc.)
+puts headline.type
 ```
 ### Basic Page structure
 ```ruby
-page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
+page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
 # iterate PageHeadline objects
-page.headlines.each do |headline|
+page.root_headline.headlines.each do |headline_name, headline|
   # exposing nokogiri internal elements
   elements = headline.elements.flatten
   elements.each do |element|
-    # access Nokogiri::XML::*
+    # print will result in: Nokogiri::XML::Text or Nokogiri::XML::Element
+    puts element.class
   end
   # string representation of all nested text
   block.to_texts
   # iterate PageListItem objects
   block.list_items.each do |list_item|
     # string representation of nested text
@@ -131,62 +117,105 @@ page.headlines.each do |headline|
     # string representation of nested text
     link.to_text
   end
 end
 ```
-### Example using Global config (https://en.wikipedia.org/wiki/Ruby_on_rails)
+### Example using Global config (https://en.wikipedia.org/wiki/Ruby_on_Rails)
 This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and printing the References headline links for each list item.
 ```ruby
 # setting a target config
-CONFIG = { uri: "https://en.wikipedia.org" }
-Wiki::Api::Connect.config = CONFIG
+Wiki::Api::Connect.config = { uri: 'https://en.wikipedia.org' }
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_rails"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails')
 # get headlines with name Reference (there can be multiple headlines with the same name!)
-headlines = page.headline "References"
+headlines = page.root_headline.headline('References')
 # iterate headlines
 headlines.each do |headline|
   # iterate list items on the given headline
   headline.block.list_items.each do |list_item|
     # print the uri of all links
-    puts list_item.links.map{ |l| l.uri }
+    puts list_item.links.map(&:uri)
   end
 end
 ```
-### Example passing URI (https://en.wikipedia.org/wiki/Ruby_on_rails)
+### Example passing URI (https://en.wikipedia.org/wiki/Ruby_on_Rails)
 This is the same example as the one above, except for setting a global config to direct the requests to a given URI.
 ```ruby
 # querying the page
-page = Wiki::Api::Page.new name: "Ruby_on_rails", uri: "https://en.wikipedia.org"
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
 # get headlines with name Reference (there can be multiple headlines with the same name!)
-headlines = page.headline "References"
+headlines = page.root_headline.headline('References')
 # iterate headlines
 headlines.each do |headline|
   # iterate list items on the given headline
   headline.block.list_items.each do |list_item|
     # print the uri of all links
-    puts list_item.links.map{ |l| l.uri }
+    puts list_item.links.map(&:uri)
   end
 end
 ```
+### Example searching headlines
+This example shows how the headlines can be searched. For more info check: https://github.com/dblommesteijn/wiki-api/blob/master/lib/wiki/api/page.rb#L97
+```ruby
+# querying the page
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
+# NOTE: the following are all valid headline names:
+# request headline (by literal name)
+headlines = page.root_headline.headline('Philosophy_and_design')
+puts headlines.map(&:name)
+# request headline (by downcase name)
+headlines = page.root_headline.headline('philosophy_and_design')
+puts headlines.map(&:name)
+# request headline (by human name)
+headlines = page.root_headline.headline('philosophy and design')
+puts headlines.map(&:name)
+# NOTE2: headlines are matched on headline.start_with?(requested_headline)
+# because of start_with? compare this should work as well!
+headlines = page.root_headline.headline('philosophy')
+puts headlines.map(&:name)
+```
+### Example searching headlines in depth
+Recursive search on all nested headlines, including in depth searches.
+```ruby
+# querying the page
+page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
+# get root
+root_headline = page.root_headline
+# lookup 'ramework structure' on current level
+headline = root_headline.headline_in_depth('framework structure').first
+puts headline.name
+# NOTE: lookup of nested headlines does not work with the headline function (because 'Framework_structure' is nested within 'Technical_overview')
+headline = root_headline.headline('framework structure').first
+# depth can be limited adding the depth parameter
+# NOTE: the example below will return nil, 'Framework_structure' is nested beyond depth = 0!
+depth = 0
+headline = root_headline.headline_in_depth('framework structure', depth).first
+# increasing depth search will show the requested headline
+depth = 5
+headline = root_headline.headline_in_depth('framework structure', depth).first
+puts headline.name
+```

data/Rakefile CHANGED Viewed

@@ -1 +1,13 @@
-require "bundler/gem_tasks"
+# frozen_string_literal: true
+require 'bundler/gem_tasks'
+require 'rake/testtask'
+Rake::TestTask.new do |t|
+  t.libs << 'test'
+  tfs = FileList['test/unit/*.rb']
+  t.test_files = tfs
+  t.verbose = true
+end
+task default: %i[build install]

data/bin/console ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require 'bundler/setup'
+require 'wiki/api'
+require 'pry'
+Pry.start

data/lib/wiki/api/connect.rb CHANGED Viewed

@@ -1,71 +1,95 @@
+# frozen_string_literal: true
 require 'net/http'
 require 'json'
 require 'nokogiri'
 module Wiki
   module Api
     class Connect
+      attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed, :file
-      attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed
-      def initialize(options={})
-        @@config ||= nil
-        options.merge! @@config unless @@config.nil?
-        self.uri = options[:uri] if options.include? :uri
-        self.api_path = options[:api_path] if options.include? :api_path
-        self.api_options = options[:api_options] if options.include? :api_options
+      def initialize(options = {})
+        @@config ||= {}
+        self.uri = options[:uri] || @@config[:uri]
+        self.file = options[:file] || @@config[:file]
+        self.api_path = options[:api_path] || @@config[:api_path]
+        self.api_options = options[:api_options] || @@config[:api_options]
         # defaults
-        self.api_path ||= "/w/api.php"
-        self.api_options ||= {action: "parse", format: "json", page: ""}
+        self.api_path ||= '/w/api.php'
+        self.api_options ||= { action: 'parse', format: 'json', page: '' }
         # errors
-        raise "no uri given" if self.uri.nil?
+        raise('no uri given') if uri.nil?
       end
       def connect
         uri = URI("#{self.uri}#{self.api_path}")
-        uri.query = URI.encode_www_form self.api_options
+        uri.query = URI.encode_www_form(self.api_options)
         self.http = Net::HTTP.new(uri.host, uri.port)
-        if uri.scheme == "https"
-          self.http.use_ssl = true
-          #self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
+        if uri.scheme == 'https'
+          http.use_ssl = true
+          # self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
         end
         self.request = Net::HTTP::Get.new(uri.request_uri)
-        self.response = self.http.request(request)
+        self.response = http.request(request)
       end
-      def page page_name
+      def page(page_name)
         self.api_options[:page] = page_name
-        self.connect
+        # parse page by uri
+        if !uri.nil? && file.nil?
+          self.parsed = parse_from_uri(response)
+        # parse page by file
+        elsif !file.nil?
+          self.parsed = parse_from_file(file)
+        # invalid config, raise exception
+        else
+          raise('no :uri or :file config found!')
+        end
+        parsed
+      end
+      def parse_from_uri(response)
+        connect
+        # rubocop:disable Lint/ShadowedArgument
         response = self.response
-        json = JSON.parse response.body, {symbolize_names: true}
-        raise json[:error][:code] unless valid? json, response
+        # rubocop:enable Lint/ShadowedArgument
+        json = JSON.parse(response.body, { symbolize_names: true })
+        raise(json[:error][:code]) unless valid?(json, response)
         self.html = json[:parse][:text]
-        self.parsed = Nokogiri::HTML self.html[:*]
+        self.parsed = Nokogiri::HTML(html[:*])
+      end
+      def parse_from_file(file)
+        f = File.open(file)
+        ret = Nokogiri::HTML(f)
+        f.close
+        ret
       end
       class << self
         def config=(config = {})
           @@config = config
         end
         def config
           @@config ||= []
         end
       end
       protected
-      def valid? json, response
+      def valid?(json, response)
         b = []
         # valid http response
-        b << (response.is_a? Net::HTTPOK)
+        b << (response.is_a?(Net::HTTPOK))
         # not an invalid api response handle
-        b << (!json.include? :error)
+        b << (!json.include?(:error))
         !b.include?(false)
       end
     end
   end
-end
+end

data/lib/wiki/api/page.rb CHANGED Viewed

@@ -1,136 +1,102 @@
+# frozen_string_literal: true
 module Wiki
   module Api
+    # MediaWiki Page, collection of all html information plus it's page title
     class Page
+      attr_accessor :name, :parsed_page, :uri, :parent
-      attr_accessor :name, :parsed_page, :uri
-      def initialize(options={})
-        self.name = options[:name] if options.include? :name
-        uri = options[:uri] if options.include? :uri
-        @@config ||= nil
-        if @@config.nil? || !uri.nil?
-          # use the connection to collect HTML pages for parsing
-          @connect = Wiki::Api::Connect.new uri: uri
-        else
-          # using a local HTML file for parsing
-        end
+      def initialize(options = {})
+        self.name = options[:name] if options.include?(:name)
+        self.uri = options[:uri] if options.include?(:uri)
+        @connect = Wiki::Api::Connect.new(uri:)
       end
-      def headlines
-        headlines = []
-        self.parse_blocks.each do |headline_name, elements|
-          headline = PageHeadline.new name: headline_name
-          elements.each do |element|
-            # nokogiri element
-            headline.block << element
-          end
-          headlines << headline
-        end
-        headlines
-      end
+      attr_reader :connect
-      def headline headline_name
-        headlines = []
-        self.parse_blocks(headline_name).each do |headline_name, elements|
-          headline = PageHeadline.new name: headline_name
-          elements.each do |element|
-            # nokogiri element
-            headline.block << element
-          end
-          headlines << headline
-        end
-        headlines
+      # collect all headlines, keep original page formatting
+      def root_headline
+        parse_blocks
       end
+      # # collect headlines by given name, this will flatten the nested headlines
+      # def flat_headlines_by_name headline_name
+      #   raise "not yet implemented!"
+      #   # TODO: implement flattening of headlines within the root headline
+      #   # ALT:  breath search option in the root of the first headline
+      #   self.parse_blocks(headline_name)
+      # end
       def to_html
-        self.load_page!
-        self.parsed_page.to_xhtml indent: 3, indent_text: " "
+        load_page!
+        parsed_page.to_xhtml(indent: 3, indent_text: ' ')
       end
       def reset!
         self.parse_page = nil
       end
-      class << self
-        def config=(config = {})
-          @@config = config
-        end
-      end
-      protected
       def load_page!
-        if @@config.nil?
-          self.parsed_page ||= @connect.page self.name
-        elsif self.parsed_page.nil?
-          f = File.open(@@config[:file])
-          self.parsed_page = Nokogiri::HTML(f)
-          f.close
-        end
+        self.parsed_page ||= @connect.page(name)
       end
       # parse blocks
-      def parse_blocks headline_name = nil
-        self.load_page!
+      def parse_blocks(headline_name = nil)
+        load_page!
         result = {}
         # get headline nodes by span class
-        xs = self.parsed_page.xpath("//span[@class='mw-headline']")
+        headlines = self.parsed_page.xpath("//span[@class='mw-headline']")
         # filter single headline by name (ignore case)
-        xs = self.filter_headline xs, headline_name unless headline_name.nil?
+        headlines = filter_headline(headlines, headline_name) unless headline_name.nil?
         # NOTE: first_part has no id attribute and thus cannot be filtered or processed within xpath (xs)
-        if headline_name == self.name || headline_name.nil?
-          x = self.first_part
-          result[self.name] ||= []
-          result[self.name] << (self.collect_elements(x.parent))
+        if headline_name.nil? || headline_name.start_with?(name.downcase)
+          x = first_part
+          result[name] ||= []
+          result[name] << (collect_elements(x.parent))
         end
         # append all blocks
-        xs.each do |x|
-          headline = x.attributes["id"].value
-          elements = self.collect_elements x.parent.next
-          result[headline] ||= []
-          result[headline] << elements
+        headlines.each do |headline|
+          headline_value = headline.attributes['id'].value
+          elements = collect_elements(headline.parent.next)
+          result[headline_value] ||= []
+          result[headline_value] << elements
         end
-        result
+        # create root object
+        PageHeadline.new(parent: self, name: result.first[0], headlines: result, level: 0)
       end
       # harvest first part of the page (missing heading and class="mw-headline")
       def first_part
-        self.parsed_page ||= @connect.page self.name
-        self.parsed_page.search("p").first.children.first
+        self.parsed_page ||= @connect.page(name)
+        self.parsed_page.search('p').first.children.first
       end
       # collect elements within headlines (not nested properties, but next elements)
-      def collect_elements element
+      def collect_elements(element)
         # capture first element name
         elements = []
         # iterate text until next headline
-        while true do
+        loop do
           elements << element
           element = element.next
-          break if element.nil? || element.to_html.include?("class=\"mw-headline\"")
+          break if element.nil? || element.to_html.include?('class="mw-headline"')
         end
         elements
       end
-      def filter_headline xs, headline_name
+      def filter_headline(xs, headline_name)
         # transform name to a wiki_id (downcase and space replace with underscore)
-        headline_name = headline_name.downcase.gsub(" ", "_")
+        headline_name = headline_name.downcase.gsub(' ', '_')
         # reject not matching id's
-        xs.reject do |t|
-          !t.attributes["id"].value.downcase.start_with?(headline_name)
+        xs.select do |t|
+          t.attributes['id'].value.downcase.start_with?(headline_name)
         end
       end
     end
   end
-end
+end