wiki-api 0.0.2 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: !binary |-
4
- NjgxOGUxZjQ2MWQ2MjNhMDA2ZGUwMTRhOGI4MWFlOGQ3MzI4MWFjOA==
5
- data.tar.gz: !binary |-
6
- ZmZkNDFhMzc0ZTNmZDBlYTFmMTIwMmU5ZDgzYTQ2YjM0ZTk1ZmQzYg==
2
+ SHA256:
3
+ metadata.gz: cd978cd4dad89ddc8098d6abafcd6325ec6c0c4a4a5e5b8e93855bc118314b27
4
+ data.tar.gz: c5ead46deb2d10310823d4b639046058cf087a29cb6a0413a5e3addc64037b92
7
5
  SHA512:
8
- metadata.gz: !binary |-
9
- NGM4YTU2MjQ3Njk1MzJkMDhlYjcxODYxNDFkNzRlODI5MjMwNmU5ZGEzZmJj
10
- MjhjZjYxYzcxMmYzYjA0YzA3NzdlYTJhMjM0ZTllNzgyMDk0MGJiNjBiZWRl
11
- N2Y5YzMwZWZjZmY3NWQ0YmJiMjdiOTkwOTU1ZmE4MDg5Njk4M2Y=
12
- data.tar.gz: !binary |-
13
- MGZlMTYzZTgzZWE3YmYzZmIyMjc0OTZhMGY0NDEwYzJmNmFiMTZkNDM3OGM2
14
- Mjc1MDdjMzQ3MjM1NmVlODM3Mzg5ZTViMGRmOGI2NzE1NDZjODJhZTA2MjI5
15
- NWE3YmI4MDYxY2I4NGM3MGUwNzAzNjQ3YjMwODU5NDBlMWYxZDM=
6
+ metadata.gz: fcb6e3991c12a415a79b4c109091a41dbe45bff7ee3040a1a4283ddc2625522cfca767c65cba45e0f29bb13d410f082b78337de25d0bfd2bd9e0bd1591a36c24
7
+ data.tar.gz: 3a78fa474766c4cc10c44eb3e8a90ed95c1ddac1f306afa878da2ccf7b75e4fd179fc7933499f261c408cdd2f396d3613a6d74361bdad160cb3c13727aaa135c
data/.rubocop.yml ADDED
@@ -0,0 +1,24 @@
1
+ AllCops:
2
+ SuggestExtensions: false
3
+ Style/ClassVars:
4
+ Enabled: false
5
+ Style/Documentation:
6
+ Enabled: false
7
+ Style/MethodCallWithArgsParentheses:
8
+ Enabled: true
9
+ Metrics/AbcSize:
10
+ Enabled: false
11
+ Metrics/ClassLength:
12
+ Enabled: false
13
+ Metrics/CyclomaticComplexity:
14
+ Enabled: false
15
+ Metrics/PerceivedComplexity:
16
+ Enabled: false
17
+ Metrics/MethodLength:
18
+ Enabled: false
19
+ Naming/MethodParameterName:
20
+ Enabled: false
21
+ Naming/PredicateName:
22
+ Enabled: false
23
+ Lint/RescueException:
24
+ Enabled: false
data/.travis.yml ADDED
@@ -0,0 +1,12 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.3
4
+ - 2.1.0
5
+ - jruby-19mode
6
+ - ruby-head
7
+ - jruby-head
8
+ jdk:
9
+ - oraclejdk7
10
+ before_install:
11
+ - gem update --system
12
+ - gem --version
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source 'https://rubygems.org'
2
4
 
3
5
  # Specify your gem's dependencies in wiki-api.gemspec
data/README.md CHANGED
@@ -1,43 +1,20 @@
1
1
  # Wiki::Api
2
2
 
3
- Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes like: Page on which you can request page parameters (like headlines, and text blocks within headlines).
3
+ [![Build Status](https://travis-ci.org/dblommesteijn/wiki-api.svg?branch=master)](https://travis-ci.org/dblommesteijn/wiki-api) [![Code Climate](https://codeclimate.com/github/dblommesteijn/wiki-api.png)](https://codeclimate.com/github/dblommesteijn/wiki-api)
4
4
 
5
- NOTE: nokogiri is used for background parsing of HTML. Because I believe there is no point of wrapping internals (composing) for this purpose, nokogiri nodes elements etc. are exposed (http://nokogiri.org/Nokogiri.html) through the wiki-api.
5
+ Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes for Page and Headline parsing. You're able to iterate through these headlines, and access data accordingly.
6
+
7
+ NOTE: This gem has a nokogiri (http://nokogiri.org/Nokogiri.html) backend (for HTML parsing). Major components: `Page`, `Headline`, `Block`, `ListItem`, and `Link` are wrappers for easy data access, however it's still possible to retreive the raw HTML within these objects.
6
8
 
7
9
  Requests to the MediaWiki API use the following URI structure:
8
10
 
9
11
  http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
10
12
 
13
+ ### Dependencies
11
14
 
12
- ### Dependencies (production)
13
-
14
- * json
15
15
  * nokogiri
16
16
 
17
17
 
18
- ### Roadmap
19
-
20
- * Version (0.0.2) (current)
21
-
22
- Index important words per block, page, list item;
23
-
24
- Parse objects for more elements within a Page.
25
-
26
-
27
- ### Changelog
28
-
29
- * Version (0.0.1) -> (0.0.2)
30
-
31
- Nested ListItems, Links (within Page)
32
-
33
- Search on Page headline (ignore case, and underscore)
34
-
35
-
36
- ### Known Issues
37
-
38
- None discovered thus far.
39
-
40
-
41
18
  ## Installation
42
19
 
43
20
  Add this line to your application's Gemfile (bundler):
@@ -52,32 +29,41 @@ Or install it yourself (RubyGems):
52
29
 
53
30
  $ gem install wiki-api
54
31
 
32
+ Or try it from this repository (local) in a console:
33
+
34
+ $ bin/console
35
+
55
36
 
56
37
  ## Setup
57
38
 
58
39
  Define a configuration for your connection (initialize script), this example uses wiktionary.org.
59
- NOTE: it can connect to both HTTP and HTTPS MediaWikis.
60
-
61
- ```ruby
62
- CONFIG = { uri: "http://en.wiktionary.org" }
63
- ```
40
+ NOTE: it can connect to both HTTP and HTTPS MediaWikis (however you'll get a 302 response from MediaWiki)
64
41
 
65
42
  Setup default configuration (initialize script)
66
43
 
67
44
  ```ruby
68
- Wiki::Api::Connect.config = CONFIG
45
+ Wiki::Api::Connect.config = { uri: 'https://en.wiktionary.org' }
69
46
  ```
70
47
 
71
48
 
49
+ ## Running tests
50
+
51
+ ```bash
52
+ $ rake test
53
+ ```
54
+
72
55
  ## Usage
73
56
 
74
- ### Query a Page
57
+ ### Query a Page and Headline
75
58
 
76
59
  Requesting headlines from a given page.
77
60
 
78
61
  ```ruby
79
- page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
80
- page.headlines.each do |headline|
62
+ page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
63
+ # the root headline equals the pagename
64
+ puts page.root_headline.name
65
+ # iterate next level of headlines
66
+ page.root_headline.headlines.each do |headline_name, headline|
81
67
  # printing headline name (PageHeadline)
82
68
  puts headline.name
83
69
  end
@@ -86,30 +72,30 @@ end
86
72
  Getting headlines for a given name.
87
73
 
88
74
  ```ruby
89
- page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
90
- page.headline("Wiktionary:Welcome,_newcomers").each do |headline|
91
- # printing headline name (PageHeadline)
92
- puts headline.name
93
- end
75
+ page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
76
+ # lookup headline by name (underscore and case are ignored)
77
+ headline = page.root_headline.headline('editing wiktionary').first
78
+ # printing headline name (PageHeadline)
79
+ puts headline.name
80
+ # get the type of nested headline (html h1,2,3,4 etc.)
81
+ puts headline.type
94
82
  ```
95
83
 
96
84
  ### Basic Page structure
97
85
 
98
86
  ```ruby
99
- page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
100
-
87
+ page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
101
88
  # iterate PageHeadline objects
102
- page.headlines.each do |headline|
103
-
89
+ page.root_headline.headlines.each do |headline_name, headline|
104
90
  # exposing nokogiri internal elements
105
91
  elements = headline.elements.flatten
106
92
  elements.each do |element|
107
- # access Nokogiri::XML::*
93
+ # print will result in: Nokogiri::XML::Text or Nokogiri::XML::Element
94
+ puts element.class
108
95
  end
109
96
 
110
97
  # string representation of all nested text
111
98
  block.to_texts
112
-
113
99
  # iterate PageListItem objects
114
100
  block.list_items.each do |list_item|
115
101
  # string representation of nested text
@@ -131,62 +117,105 @@ page.headlines.each do |headline|
131
117
  # string representation of nested text
132
118
  link.to_text
133
119
  end
134
-
135
120
  end
136
121
  ```
137
122
 
138
123
 
139
- ### Example using Global config (https://en.wikipedia.org/wiki/Ruby_on_rails)
124
+ ### Example using Global config (https://en.wikipedia.org/wiki/Ruby_on_Rails)
140
125
 
141
126
  This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and printing the References headline links for each list item.
142
127
 
143
128
  ```ruby
144
129
  # setting a target config
145
- CONFIG = { uri: "https://en.wikipedia.org" }
146
- Wiki::Api::Connect.config = CONFIG
130
+ Wiki::Api::Connect.config = { uri: 'https://en.wikipedia.org' }
147
131
 
148
132
  # querying the page
149
- page = Wiki::Api::Page.new name: "Ruby_on_rails"
133
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails')
150
134
 
151
135
  # get headlines with name Reference (there can be multiple headlines with the same name!)
152
- headlines = page.headline "References"
136
+ headlines = page.root_headline.headline('References')
153
137
 
154
138
  # iterate headlines
155
139
  headlines.each do |headline|
156
140
  # iterate list items on the given headline
157
141
  headline.block.list_items.each do |list_item|
158
-
159
142
  # print the uri of all links
160
- puts list_item.links.map{ |l| l.uri }
161
-
143
+ puts list_item.links.map(&:uri)
162
144
  end
163
145
  end
164
146
  ```
165
147
 
166
148
 
167
-
168
- ### Example passing URI (https://en.wikipedia.org/wiki/Ruby_on_rails)
149
+ ### Example passing URI (https://en.wikipedia.org/wiki/Ruby_on_Rails)
169
150
 
170
151
  This is the same example as the one above, except for setting a global config to direct the requests to a given URI.
171
152
 
172
153
  ```ruby
173
154
  # querying the page
174
- page = Wiki::Api::Page.new name: "Ruby_on_rails", uri: "https://en.wikipedia.org"
155
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
175
156
 
176
157
  # get headlines with name Reference (there can be multiple headlines with the same name!)
177
- headlines = page.headline "References"
158
+ headlines = page.root_headline.headline('References')
178
159
 
179
160
  # iterate headlines
180
161
  headlines.each do |headline|
181
162
  # iterate list items on the given headline
182
163
  headline.block.list_items.each do |list_item|
183
-
184
164
  # print the uri of all links
185
- puts list_item.links.map{ |l| l.uri }
186
-
165
+ puts list_item.links.map(&:uri)
187
166
  end
188
167
  end
189
168
  ```
190
169
 
191
170
 
171
+ ### Example searching headlines
172
+
173
+ This example shows how the headlines can be searched. For more info check: https://github.com/dblommesteijn/wiki-api/blob/master/lib/wiki/api/page.rb#L97
174
+
175
+
176
+ ```ruby
177
+ # querying the page
178
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
179
+
180
+ # NOTE: the following are all valid headline names:
181
+ # request headline (by literal name)
182
+ headlines = page.root_headline.headline('Philosophy_and_design')
183
+ puts headlines.map(&:name)
184
+ # request headline (by downcase name)
185
+ headlines = page.root_headline.headline('philosophy_and_design')
186
+ puts headlines.map(&:name)
187
+ # request headline (by human name)
188
+ headlines = page.root_headline.headline('philosophy and design')
189
+ puts headlines.map(&:name)
190
+
191
+ # NOTE2: headlines are matched on headline.start_with?(requested_headline)
192
+ # because of start_with? compare this should work as well!
193
+ headlines = page.root_headline.headline('philosophy')
194
+ puts headlines.map(&:name)
195
+ ```
196
+
197
+
198
+ ### Example searching headlines in depth
199
+
200
+ Recursive search on all nested headlines, including in depth searches.
201
+
202
+ ```ruby
203
+ # querying the page
204
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
205
+ # get root
206
+ root_headline = page.root_headline
207
+ # lookup 'ramework structure' on current level
208
+ headline = root_headline.headline_in_depth('framework structure').first
209
+ puts headline.name
210
+ # NOTE: lookup of nested headlines does not work with the headline function (because 'Framework_structure' is nested within 'Technical_overview')
211
+ headline = root_headline.headline('framework structure').first
212
+ # depth can be limited adding the depth parameter
213
+ # NOTE: the example below will return nil, 'Framework_structure' is nested beyond depth = 0!
214
+ depth = 0
215
+ headline = root_headline.headline_in_depth('framework structure', depth).first
216
+ # increasing depth search will show the requested headline
217
+ depth = 5
218
+ headline = root_headline.headline_in_depth('framework structure', depth).first
219
+ puts headline.name
220
+ ```
192
221
 
data/Rakefile CHANGED
@@ -1 +1,13 @@
1
- require "bundler/gem_tasks"
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/gem_tasks'
4
+ require 'rake/testtask'
5
+
6
+ Rake::TestTask.new do |t|
7
+ t.libs << 'test'
8
+ tfs = FileList['test/unit/*.rb']
9
+ t.test_files = tfs
10
+ t.verbose = true
11
+ end
12
+
13
+ task default: %i[build install]
data/bin/console ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'bundler/setup'
5
+ require 'wiki/api'
6
+ require 'pry'
7
+
8
+ Pry.start
@@ -1,71 +1,95 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'net/http'
2
4
  require 'json'
3
5
  require 'nokogiri'
4
6
 
5
7
  module Wiki
6
8
  module Api
7
-
8
9
  class Connect
10
+ attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed, :file
9
11
 
10
- attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed
11
-
12
- def initialize(options={})
13
- @@config ||= nil
14
- options.merge! @@config unless @@config.nil?
15
- self.uri = options[:uri] if options.include? :uri
16
- self.api_path = options[:api_path] if options.include? :api_path
17
- self.api_options = options[:api_options] if options.include? :api_options
12
+ def initialize(options = {})
13
+ @@config ||= {}
14
+ self.uri = options[:uri] || @@config[:uri]
15
+ self.file = options[:file] || @@config[:file]
16
+ self.api_path = options[:api_path] || @@config[:api_path]
17
+ self.api_options = options[:api_options] || @@config[:api_options]
18
18
 
19
19
  # defaults
20
- self.api_path ||= "/w/api.php"
21
- self.api_options ||= {action: "parse", format: "json", page: ""}
20
+ self.api_path ||= '/w/api.php'
21
+ self.api_options ||= { action: 'parse', format: 'json', page: '' }
22
22
 
23
23
  # errors
24
- raise "no uri given" if self.uri.nil?
24
+ raise('no uri given') if uri.nil?
25
25
  end
26
26
 
27
27
  def connect
28
28
  uri = URI("#{self.uri}#{self.api_path}")
29
- uri.query = URI.encode_www_form self.api_options
29
+ uri.query = URI.encode_www_form(self.api_options)
30
30
  self.http = Net::HTTP.new(uri.host, uri.port)
31
- if uri.scheme == "https"
32
- self.http.use_ssl = true
33
- #self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
31
+ if uri.scheme == 'https'
32
+ http.use_ssl = true
33
+ # self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
34
34
  end
35
35
  self.request = Net::HTTP::Get.new(uri.request_uri)
36
- self.response = self.http.request(request)
36
+ self.response = http.request(request)
37
37
  end
38
38
 
39
- def page page_name
39
+ def page(page_name)
40
40
  self.api_options[:page] = page_name
41
- self.connect
41
+ # parse page by uri
42
+ if !uri.nil? && file.nil?
43
+ self.parsed = parse_from_uri(response)
44
+ # parse page by file
45
+ elsif !file.nil?
46
+ self.parsed = parse_from_file(file)
47
+ # invalid config, raise exception
48
+ else
49
+ raise('no :uri or :file config found!')
50
+ end
51
+ parsed
52
+ end
53
+
54
+ def parse_from_uri(response)
55
+ connect
56
+ # rubocop:disable Lint/ShadowedArgument
42
57
  response = self.response
43
- json = JSON.parse response.body, {symbolize_names: true}
44
- raise json[:error][:code] unless valid? json, response
58
+ # rubocop:enable Lint/ShadowedArgument
59
+ json = JSON.parse(response.body, { symbolize_names: true })
60
+ raise(json[:error][:code]) unless valid?(json, response)
61
+
45
62
  self.html = json[:parse][:text]
46
- self.parsed = Nokogiri::HTML self.html[:*]
63
+ self.parsed = Nokogiri::HTML(html[:*])
64
+ end
65
+
66
+ def parse_from_file(file)
67
+ f = File.open(file)
68
+ ret = Nokogiri::HTML(f)
69
+ f.close
70
+ ret
47
71
  end
48
72
 
49
73
  class << self
50
74
  def config=(config = {})
51
75
  @@config = config
52
76
  end
77
+
53
78
  def config
54
79
  @@config ||= []
55
80
  end
56
81
  end
57
82
 
58
83
  protected
59
- def valid? json, response
84
+
85
+ def valid?(json, response)
60
86
  b = []
61
87
  # valid http response
62
- b << (response.is_a? Net::HTTPOK)
88
+ b << (response.is_a?(Net::HTTPOK))
63
89
  # not an invalid api response handle
64
- b << (!json.include? :error)
90
+ b << (!json.include?(:error))
65
91
  !b.include?(false)
66
92
  end
67
-
68
93
  end
69
-
70
94
  end
71
- end
95
+ end
data/lib/wiki/api/page.rb CHANGED
@@ -1,136 +1,102 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module Wiki
2
4
  module Api
3
-
5
+ # MediaWiki Page, collection of all html information plus it's page title
4
6
  class Page
7
+ attr_accessor :name, :parsed_page, :uri, :parent
5
8
 
6
- attr_accessor :name, :parsed_page, :uri
7
-
8
- def initialize(options={})
9
- self.name = options[:name] if options.include? :name
10
- uri = options[:uri] if options.include? :uri
11
-
12
- @@config ||= nil
13
- if @@config.nil? || !uri.nil?
14
- # use the connection to collect HTML pages for parsing
15
- @connect = Wiki::Api::Connect.new uri: uri
16
- else
17
- # using a local HTML file for parsing
18
- end
9
+ def initialize(options = {})
10
+ self.name = options[:name] if options.include?(:name)
11
+ self.uri = options[:uri] if options.include?(:uri)
12
+ @connect = Wiki::Api::Connect.new(uri:)
19
13
  end
20
14
 
21
- def headlines
22
- headlines = []
23
- self.parse_blocks.each do |headline_name, elements|
24
- headline = PageHeadline.new name: headline_name
25
- elements.each do |element|
26
- # nokogiri element
27
- headline.block << element
28
- end
29
- headlines << headline
30
- end
31
- headlines
32
- end
15
+ attr_reader :connect
33
16
 
34
- def headline headline_name
35
- headlines = []
36
- self.parse_blocks(headline_name).each do |headline_name, elements|
37
- headline = PageHeadline.new name: headline_name
38
- elements.each do |element|
39
- # nokogiri element
40
- headline.block << element
41
- end
42
- headlines << headline
43
- end
44
- headlines
17
+ # collect all headlines, keep original page formatting
18
+ def root_headline
19
+ parse_blocks
45
20
  end
46
21
 
47
-
22
+ # # collect headlines by given name, this will flatten the nested headlines
23
+ # def flat_headlines_by_name headline_name
24
+ # raise "not yet implemented!"
25
+ # # TODO: implement flattening of headlines within the root headline
26
+ # # ALT: breath search option in the root of the first headline
27
+ # self.parse_blocks(headline_name)
28
+ # end
48
29
 
49
30
  def to_html
50
- self.load_page!
51
- self.parsed_page.to_xhtml indent: 3, indent_text: " "
31
+ load_page!
32
+ parsed_page.to_xhtml(indent: 3, indent_text: ' ')
52
33
  end
53
34
 
54
35
  def reset!
55
36
  self.parse_page = nil
56
37
  end
57
38
 
58
- class << self
59
- def config=(config = {})
60
- @@config = config
61
- end
62
- end
63
-
64
- protected
65
-
66
39
  def load_page!
67
- if @@config.nil?
68
- self.parsed_page ||= @connect.page self.name
69
- elsif self.parsed_page.nil?
70
- f = File.open(@@config[:file])
71
- self.parsed_page = Nokogiri::HTML(f)
72
- f.close
73
- end
40
+ self.parsed_page ||= @connect.page(name)
74
41
  end
75
42
 
76
-
77
43
  # parse blocks
78
- def parse_blocks headline_name = nil
79
- self.load_page!
44
+ def parse_blocks(headline_name = nil)
45
+ load_page!
80
46
  result = {}
81
47
 
82
48
  # get headline nodes by span class
83
- xs = self.parsed_page.xpath("//span[@class='mw-headline']")
49
+ headlines = self.parsed_page.xpath("//span[@class='mw-headline']")
50
+
84
51
  # filter single headline by name (ignore case)
85
- xs = self.filter_headline xs, headline_name unless headline_name.nil?
52
+ headlines = filter_headline(headlines, headline_name) unless headline_name.nil?
86
53
 
87
54
  # NOTE: first_part has no id attribute and thus cannot be filtered or processed within xpath (xs)
88
- if headline_name == self.name || headline_name.nil?
89
- x = self.first_part
90
- result[self.name] ||= []
91
- result[self.name] << (self.collect_elements(x.parent))
55
+ if headline_name.nil? || headline_name.start_with?(name.downcase)
56
+ x = first_part
57
+ result[name] ||= []
58
+ result[name] << (collect_elements(x.parent))
92
59
  end
93
60
 
94
61
  # append all blocks
95
- xs.each do |x|
96
- headline = x.attributes["id"].value
97
- elements = self.collect_elements x.parent.next
98
- result[headline] ||= []
99
- result[headline] << elements
62
+ headlines.each do |headline|
63
+ headline_value = headline.attributes['id'].value
64
+ elements = collect_elements(headline.parent.next)
65
+ result[headline_value] ||= []
66
+ result[headline_value] << elements
100
67
  end
101
68
 
102
- result
69
+ # create root object
70
+ PageHeadline.new(parent: self, name: result.first[0], headlines: result, level: 0)
103
71
  end
104
72
 
105
73
  # harvest first part of the page (missing heading and class="mw-headline")
106
74
  def first_part
107
- self.parsed_page ||= @connect.page self.name
108
- self.parsed_page.search("p").first.children.first
75
+ self.parsed_page ||= @connect.page(name)
76
+ self.parsed_page.search('p').first.children.first
109
77
  end
110
78
 
111
79
  # collect elements within headlines (not nested properties, but next elements)
112
- def collect_elements element
80
+ def collect_elements(element)
113
81
  # capture first element name
114
82
  elements = []
115
83
  # iterate text until next headline
116
- while true do
84
+ loop do
117
85
  elements << element
118
86
  element = element.next
119
- break if element.nil? || element.to_html.include?("class=\"mw-headline\"")
87
+ break if element.nil? || element.to_html.include?('class="mw-headline"')
120
88
  end
121
89
  elements
122
90
  end
123
91
 
124
- def filter_headline xs, headline_name
92
+ def filter_headline(xs, headline_name)
125
93
  # transform name to a wiki_id (downcase and space replace with underscore)
126
- headline_name = headline_name.downcase.gsub(" ", "_")
94
+ headline_name = headline_name.downcase.gsub(' ', '_')
127
95
  # reject not matching id's
128
- xs.reject do |t|
129
- !t.attributes["id"].value.downcase.start_with?(headline_name)
96
+ xs.select do |t|
97
+ t.attributes['id'].value.downcase.start_with?(headline_name)
130
98
  end
131
99
  end
132
-
133
100
  end
134
-
135
101
  end
136
- end
102
+ end