wiki-api 0.0.2 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: !binary |-
4
- NjgxOGUxZjQ2MWQ2MjNhMDA2ZGUwMTRhOGI4MWFlOGQ3MzI4MWFjOA==
5
- data.tar.gz: !binary |-
6
- ZmZkNDFhMzc0ZTNmZDBlYTFmMTIwMmU5ZDgzYTQ2YjM0ZTk1ZmQzYg==
2
+ SHA256:
3
+ metadata.gz: cd978cd4dad89ddc8098d6abafcd6325ec6c0c4a4a5e5b8e93855bc118314b27
4
+ data.tar.gz: c5ead46deb2d10310823d4b639046058cf087a29cb6a0413a5e3addc64037b92
7
5
  SHA512:
8
- metadata.gz: !binary |-
9
- NGM4YTU2MjQ3Njk1MzJkMDhlYjcxODYxNDFkNzRlODI5MjMwNmU5ZGEzZmJj
10
- MjhjZjYxYzcxMmYzYjA0YzA3NzdlYTJhMjM0ZTllNzgyMDk0MGJiNjBiZWRl
11
- N2Y5YzMwZWZjZmY3NWQ0YmJiMjdiOTkwOTU1ZmE4MDg5Njk4M2Y=
12
- data.tar.gz: !binary |-
13
- MGZlMTYzZTgzZWE3YmYzZmIyMjc0OTZhMGY0NDEwYzJmNmFiMTZkNDM3OGM2
14
- Mjc1MDdjMzQ3MjM1NmVlODM3Mzg5ZTViMGRmOGI2NzE1NDZjODJhZTA2MjI5
15
- NWE3YmI4MDYxY2I4NGM3MGUwNzAzNjQ3YjMwODU5NDBlMWYxZDM=
6
+ metadata.gz: fcb6e3991c12a415a79b4c109091a41dbe45bff7ee3040a1a4283ddc2625522cfca767c65cba45e0f29bb13d410f082b78337de25d0bfd2bd9e0bd1591a36c24
7
+ data.tar.gz: 3a78fa474766c4cc10c44eb3e8a90ed95c1ddac1f306afa878da2ccf7b75e4fd179fc7933499f261c408cdd2f396d3613a6d74361bdad160cb3c13727aaa135c
data/.rubocop.yml ADDED
@@ -0,0 +1,24 @@
1
+ AllCops:
2
+ SuggestExtensions: false
3
+ Style/ClassVars:
4
+ Enabled: false
5
+ Style/Documentation:
6
+ Enabled: false
7
+ Style/MethodCallWithArgsParentheses:
8
+ Enabled: true
9
+ Metrics/AbcSize:
10
+ Enabled: false
11
+ Metrics/ClassLength:
12
+ Enabled: false
13
+ Metrics/CyclomaticComplexity:
14
+ Enabled: false
15
+ Metrics/PerceivedComplexity:
16
+ Enabled: false
17
+ Metrics/MethodLength:
18
+ Enabled: false
19
+ Naming/MethodParameterName:
20
+ Enabled: false
21
+ Naming/PredicateName:
22
+ Enabled: false
23
+ Lint/RescueException:
24
+ Enabled: false
data/.travis.yml ADDED
@@ -0,0 +1,12 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.3
4
+ - 2.1.0
5
+ - jruby-19mode
6
+ - ruby-head
7
+ - jruby-head
8
+ jdk:
9
+ - oraclejdk7
10
+ before_install:
11
+ - gem update --system
12
+ - gem --version
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source 'https://rubygems.org'
2
4
 
3
5
  # Specify your gem's dependencies in wiki-api.gemspec
data/README.md CHANGED
@@ -1,43 +1,20 @@
1
1
  # Wiki::Api
2
2
 
3
- Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes like: Page on which you can request page parameters (like headlines, and text blocks within headlines).
3
+ [![Build Status](https://travis-ci.org/dblommesteijn/wiki-api.svg?branch=master)](https://travis-ci.org/dblommesteijn/wiki-api) [![Code Climate](https://codeclimate.com/github/dblommesteijn/wiki-api.png)](https://codeclimate.com/github/dblommesteijn/wiki-api)
4
4
 
5
- NOTE: nokogiri is used for background parsing of HTML. Because I believe there is no point of wrapping internals (composing) for this purpose, nokogiri nodes elements etc. are exposed (http://nokogiri.org/Nokogiri.html) through the wiki-api.
5
+ Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes for Page and Headline parsing. You're able to iterate through these headlines, and access data accordingly.
6
+
7
+ NOTE: This gem has a nokogiri (http://nokogiri.org/Nokogiri.html) backend (for HTML parsing). Major components: `Page`, `Headline`, `Block`, `ListItem`, and `Link` are wrappers for easy data access, however it's still possible to retreive the raw HTML within these objects.
6
8
 
7
9
  Requests to the MediaWiki API use the following URI structure:
8
10
 
9
11
  http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
10
12
 
13
+ ### Dependencies
11
14
 
12
- ### Dependencies (production)
13
-
14
- * json
15
15
  * nokogiri
16
16
 
17
17
 
18
- ### Roadmap
19
-
20
- * Version (0.0.2) (current)
21
-
22
- Index important words per block, page, list item;
23
-
24
- Parse objects for more elements within a Page.
25
-
26
-
27
- ### Changelog
28
-
29
- * Version (0.0.1) -> (0.0.2)
30
-
31
- Nested ListItems, Links (within Page)
32
-
33
- Search on Page headline (ignore case, and underscore)
34
-
35
-
36
- ### Known Issues
37
-
38
- None discovered thus far.
39
-
40
-
41
18
  ## Installation
42
19
 
43
20
  Add this line to your application's Gemfile (bundler):
@@ -52,32 +29,41 @@ Or install it yourself (RubyGems):
52
29
 
53
30
  $ gem install wiki-api
54
31
 
32
+ Or try it from this repository (local) in a console:
33
+
34
+ $ bin/console
35
+
55
36
 
56
37
  ## Setup
57
38
 
58
39
  Define a configuration for your connection (initialize script), this example uses wiktionary.org.
59
- NOTE: it can connect to both HTTP and HTTPS MediaWikis.
60
-
61
- ```ruby
62
- CONFIG = { uri: "http://en.wiktionary.org" }
63
- ```
40
+ NOTE: it can connect to both HTTP and HTTPS MediaWikis (however you'll get a 302 response from MediaWiki)
64
41
 
65
42
  Setup default configuration (initialize script)
66
43
 
67
44
  ```ruby
68
- Wiki::Api::Connect.config = CONFIG
45
+ Wiki::Api::Connect.config = { uri: 'https://en.wiktionary.org' }
69
46
  ```
70
47
 
71
48
 
49
+ ## Running tests
50
+
51
+ ```bash
52
+ $ rake test
53
+ ```
54
+
72
55
  ## Usage
73
56
 
74
- ### Query a Page
57
+ ### Query a Page and Headline
75
58
 
76
59
  Requesting headlines from a given page.
77
60
 
78
61
  ```ruby
79
- page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
80
- page.headlines.each do |headline|
62
+ page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
63
+ # the root headline equals the pagename
64
+ puts page.root_headline.name
65
+ # iterate next level of headlines
66
+ page.root_headline.headlines.each do |headline_name, headline|
81
67
  # printing headline name (PageHeadline)
82
68
  puts headline.name
83
69
  end
@@ -86,30 +72,30 @@ end
86
72
  Getting headlines for a given name.
87
73
 
88
74
  ```ruby
89
- page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
90
- page.headline("Wiktionary:Welcome,_newcomers").each do |headline|
91
- # printing headline name (PageHeadline)
92
- puts headline.name
93
- end
75
+ page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
76
+ # lookup headline by name (underscore and case are ignored)
77
+ headline = page.root_headline.headline('editing wiktionary').first
78
+ # printing headline name (PageHeadline)
79
+ puts headline.name
80
+ # get the type of nested headline (html h1,2,3,4 etc.)
81
+ puts headline.type
94
82
  ```
95
83
 
96
84
  ### Basic Page structure
97
85
 
98
86
  ```ruby
99
- page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
100
-
87
+ page = Wiki::Api::Page.new(name: 'Wiktionary:Welcome,_newcomers')
101
88
  # iterate PageHeadline objects
102
- page.headlines.each do |headline|
103
-
89
+ page.root_headline.headlines.each do |headline_name, headline|
104
90
  # exposing nokogiri internal elements
105
91
  elements = headline.elements.flatten
106
92
  elements.each do |element|
107
- # access Nokogiri::XML::*
93
+ # print will result in: Nokogiri::XML::Text or Nokogiri::XML::Element
94
+ puts element.class
108
95
  end
109
96
 
110
97
  # string representation of all nested text
111
98
  block.to_texts
112
-
113
99
  # iterate PageListItem objects
114
100
  block.list_items.each do |list_item|
115
101
  # string representation of nested text
@@ -131,62 +117,105 @@ page.headlines.each do |headline|
131
117
  # string representation of nested text
132
118
  link.to_text
133
119
  end
134
-
135
120
  end
136
121
  ```
137
122
 
138
123
 
139
- ### Example using Global config (https://en.wikipedia.org/wiki/Ruby_on_rails)
124
+ ### Example using Global config (https://en.wikipedia.org/wiki/Ruby_on_Rails)
140
125
 
141
126
  This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and printing the References headline links for each list item.
142
127
 
143
128
  ```ruby
144
129
  # setting a target config
145
- CONFIG = { uri: "https://en.wikipedia.org" }
146
- Wiki::Api::Connect.config = CONFIG
130
+ Wiki::Api::Connect.config = { uri: 'https://en.wikipedia.org' }
147
131
 
148
132
  # querying the page
149
- page = Wiki::Api::Page.new name: "Ruby_on_rails"
133
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails')
150
134
 
151
135
  # get headlines with name Reference (there can be multiple headlines with the same name!)
152
- headlines = page.headline "References"
136
+ headlines = page.root_headline.headline('References')
153
137
 
154
138
  # iterate headlines
155
139
  headlines.each do |headline|
156
140
  # iterate list items on the given headline
157
141
  headline.block.list_items.each do |list_item|
158
-
159
142
  # print the uri of all links
160
- puts list_item.links.map{ |l| l.uri }
161
-
143
+ puts list_item.links.map(&:uri)
162
144
  end
163
145
  end
164
146
  ```
165
147
 
166
148
 
167
-
168
- ### Example passing URI (https://en.wikipedia.org/wiki/Ruby_on_rails)
149
+ ### Example passing URI (https://en.wikipedia.org/wiki/Ruby_on_Rails)
169
150
 
170
151
  This is the same example as the one above, except for setting a global config to direct the requests to a given URI.
171
152
 
172
153
  ```ruby
173
154
  # querying the page
174
- page = Wiki::Api::Page.new name: "Ruby_on_rails", uri: "https://en.wikipedia.org"
155
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
175
156
 
176
157
  # get headlines with name Reference (there can be multiple headlines with the same name!)
177
- headlines = page.headline "References"
158
+ headlines = page.root_headline.headline('References')
178
159
 
179
160
  # iterate headlines
180
161
  headlines.each do |headline|
181
162
  # iterate list items on the given headline
182
163
  headline.block.list_items.each do |list_item|
183
-
184
164
  # print the uri of all links
185
- puts list_item.links.map{ |l| l.uri }
186
-
165
+ puts list_item.links.map(&:uri)
187
166
  end
188
167
  end
189
168
  ```
190
169
 
191
170
 
171
+ ### Example searching headlines
172
+
173
+ This example shows how the headlines can be searched. For more info check: https://github.com/dblommesteijn/wiki-api/blob/master/lib/wiki/api/page.rb#L97
174
+
175
+
176
+ ```ruby
177
+ # querying the page
178
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
179
+
180
+ # NOTE: the following are all valid headline names:
181
+ # request headline (by literal name)
182
+ headlines = page.root_headline.headline('Philosophy_and_design')
183
+ puts headlines.map(&:name)
184
+ # request headline (by downcase name)
185
+ headlines = page.root_headline.headline('philosophy_and_design')
186
+ puts headlines.map(&:name)
187
+ # request headline (by human name)
188
+ headlines = page.root_headline.headline('philosophy and design')
189
+ puts headlines.map(&:name)
190
+
191
+ # NOTE2: headlines are matched on headline.start_with?(requested_headline)
192
+ # because of start_with? compare this should work as well!
193
+ headlines = page.root_headline.headline('philosophy')
194
+ puts headlines.map(&:name)
195
+ ```
196
+
197
+
198
+ ### Example searching headlines in depth
199
+
200
+ Recursive search on all nested headlines, including in depth searches.
201
+
202
+ ```ruby
203
+ # querying the page
204
+ page = Wiki::Api::Page.new(name: 'Ruby_on_Rails', uri: 'https://en.wikipedia.org')
205
+ # get root
206
+ root_headline = page.root_headline
207
+ # lookup 'ramework structure' on current level
208
+ headline = root_headline.headline_in_depth('framework structure').first
209
+ puts headline.name
210
+ # NOTE: lookup of nested headlines does not work with the headline function (because 'Framework_structure' is nested within 'Technical_overview')
211
+ headline = root_headline.headline('framework structure').first
212
+ # depth can be limited adding the depth parameter
213
+ # NOTE: the example below will return nil, 'Framework_structure' is nested beyond depth = 0!
214
+ depth = 0
215
+ headline = root_headline.headline_in_depth('framework structure', depth).first
216
+ # increasing depth search will show the requested headline
217
+ depth = 5
218
+ headline = root_headline.headline_in_depth('framework structure', depth).first
219
+ puts headline.name
220
+ ```
192
221
 
data/Rakefile CHANGED
@@ -1 +1,13 @@
1
- require "bundler/gem_tasks"
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/gem_tasks'
4
+ require 'rake/testtask'
5
+
6
+ Rake::TestTask.new do |t|
7
+ t.libs << 'test'
8
+ tfs = FileList['test/unit/*.rb']
9
+ t.test_files = tfs
10
+ t.verbose = true
11
+ end
12
+
13
+ task default: %i[build install]
data/bin/console ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'bundler/setup'
5
+ require 'wiki/api'
6
+ require 'pry'
7
+
8
+ Pry.start
@@ -1,71 +1,95 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'net/http'
2
4
  require 'json'
3
5
  require 'nokogiri'
4
6
 
5
7
  module Wiki
6
8
  module Api
7
-
8
9
  class Connect
10
+ attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed, :file
9
11
 
10
- attr_accessor :uri, :api_path, :api_options, :http, :request, :response, :html, :parsed
11
-
12
- def initialize(options={})
13
- @@config ||= nil
14
- options.merge! @@config unless @@config.nil?
15
- self.uri = options[:uri] if options.include? :uri
16
- self.api_path = options[:api_path] if options.include? :api_path
17
- self.api_options = options[:api_options] if options.include? :api_options
12
+ def initialize(options = {})
13
+ @@config ||= {}
14
+ self.uri = options[:uri] || @@config[:uri]
15
+ self.file = options[:file] || @@config[:file]
16
+ self.api_path = options[:api_path] || @@config[:api_path]
17
+ self.api_options = options[:api_options] || @@config[:api_options]
18
18
 
19
19
  # defaults
20
- self.api_path ||= "/w/api.php"
21
- self.api_options ||= {action: "parse", format: "json", page: ""}
20
+ self.api_path ||= '/w/api.php'
21
+ self.api_options ||= { action: 'parse', format: 'json', page: '' }
22
22
 
23
23
  # errors
24
- raise "no uri given" if self.uri.nil?
24
+ raise('no uri given') if uri.nil?
25
25
  end
26
26
 
27
27
  def connect
28
28
  uri = URI("#{self.uri}#{self.api_path}")
29
- uri.query = URI.encode_www_form self.api_options
29
+ uri.query = URI.encode_www_form(self.api_options)
30
30
  self.http = Net::HTTP.new(uri.host, uri.port)
31
- if uri.scheme == "https"
32
- self.http.use_ssl = true
33
- #self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
31
+ if uri.scheme == 'https'
32
+ http.use_ssl = true
33
+ # self.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
34
34
  end
35
35
  self.request = Net::HTTP::Get.new(uri.request_uri)
36
- self.response = self.http.request(request)
36
+ self.response = http.request(request)
37
37
  end
38
38
 
39
- def page page_name
39
+ def page(page_name)
40
40
  self.api_options[:page] = page_name
41
- self.connect
41
+ # parse page by uri
42
+ if !uri.nil? && file.nil?
43
+ self.parsed = parse_from_uri(response)
44
+ # parse page by file
45
+ elsif !file.nil?
46
+ self.parsed = parse_from_file(file)
47
+ # invalid config, raise exception
48
+ else
49
+ raise('no :uri or :file config found!')
50
+ end
51
+ parsed
52
+ end
53
+
54
+ def parse_from_uri(response)
55
+ connect
56
+ # rubocop:disable Lint/ShadowedArgument
42
57
  response = self.response
43
- json = JSON.parse response.body, {symbolize_names: true}
44
- raise json[:error][:code] unless valid? json, response
58
+ # rubocop:enable Lint/ShadowedArgument
59
+ json = JSON.parse(response.body, { symbolize_names: true })
60
+ raise(json[:error][:code]) unless valid?(json, response)
61
+
45
62
  self.html = json[:parse][:text]
46
- self.parsed = Nokogiri::HTML self.html[:*]
63
+ self.parsed = Nokogiri::HTML(html[:*])
64
+ end
65
+
66
+ def parse_from_file(file)
67
+ f = File.open(file)
68
+ ret = Nokogiri::HTML(f)
69
+ f.close
70
+ ret
47
71
  end
48
72
 
49
73
  class << self
50
74
  def config=(config = {})
51
75
  @@config = config
52
76
  end
77
+
53
78
  def config
54
79
  @@config ||= []
55
80
  end
56
81
  end
57
82
 
58
83
  protected
59
- def valid? json, response
84
+
85
+ def valid?(json, response)
60
86
  b = []
61
87
  # valid http response
62
- b << (response.is_a? Net::HTTPOK)
88
+ b << (response.is_a?(Net::HTTPOK))
63
89
  # not an invalid api response handle
64
- b << (!json.include? :error)
90
+ b << (!json.include?(:error))
65
91
  !b.include?(false)
66
92
  end
67
-
68
93
  end
69
-
70
94
  end
71
- end
95
+ end
data/lib/wiki/api/page.rb CHANGED
@@ -1,136 +1,102 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module Wiki
2
4
  module Api
3
-
5
+ # MediaWiki Page, collection of all html information plus it's page title
4
6
  class Page
7
+ attr_accessor :name, :parsed_page, :uri, :parent
5
8
 
6
- attr_accessor :name, :parsed_page, :uri
7
-
8
- def initialize(options={})
9
- self.name = options[:name] if options.include? :name
10
- uri = options[:uri] if options.include? :uri
11
-
12
- @@config ||= nil
13
- if @@config.nil? || !uri.nil?
14
- # use the connection to collect HTML pages for parsing
15
- @connect = Wiki::Api::Connect.new uri: uri
16
- else
17
- # using a local HTML file for parsing
18
- end
9
+ def initialize(options = {})
10
+ self.name = options[:name] if options.include?(:name)
11
+ self.uri = options[:uri] if options.include?(:uri)
12
+ @connect = Wiki::Api::Connect.new(uri:)
19
13
  end
20
14
 
21
- def headlines
22
- headlines = []
23
- self.parse_blocks.each do |headline_name, elements|
24
- headline = PageHeadline.new name: headline_name
25
- elements.each do |element|
26
- # nokogiri element
27
- headline.block << element
28
- end
29
- headlines << headline
30
- end
31
- headlines
32
- end
15
+ attr_reader :connect
33
16
 
34
- def headline headline_name
35
- headlines = []
36
- self.parse_blocks(headline_name).each do |headline_name, elements|
37
- headline = PageHeadline.new name: headline_name
38
- elements.each do |element|
39
- # nokogiri element
40
- headline.block << element
41
- end
42
- headlines << headline
43
- end
44
- headlines
17
+ # collect all headlines, keep original page formatting
18
+ def root_headline
19
+ parse_blocks
45
20
  end
46
21
 
47
-
22
+ # # collect headlines by given name, this will flatten the nested headlines
23
+ # def flat_headlines_by_name headline_name
24
+ # raise "not yet implemented!"
25
+ # # TODO: implement flattening of headlines within the root headline
26
+ # # ALT: breath search option in the root of the first headline
27
+ # self.parse_blocks(headline_name)
28
+ # end
48
29
 
49
30
  def to_html
50
- self.load_page!
51
- self.parsed_page.to_xhtml indent: 3, indent_text: " "
31
+ load_page!
32
+ parsed_page.to_xhtml(indent: 3, indent_text: ' ')
52
33
  end
53
34
 
54
35
  def reset!
55
36
  self.parse_page = nil
56
37
  end
57
38
 
58
- class << self
59
- def config=(config = {})
60
- @@config = config
61
- end
62
- end
63
-
64
- protected
65
-
66
39
  def load_page!
67
- if @@config.nil?
68
- self.parsed_page ||= @connect.page self.name
69
- elsif self.parsed_page.nil?
70
- f = File.open(@@config[:file])
71
- self.parsed_page = Nokogiri::HTML(f)
72
- f.close
73
- end
40
+ self.parsed_page ||= @connect.page(name)
74
41
  end
75
42
 
76
-
77
43
  # parse blocks
78
- def parse_blocks headline_name = nil
79
- self.load_page!
44
+ def parse_blocks(headline_name = nil)
45
+ load_page!
80
46
  result = {}
81
47
 
82
48
  # get headline nodes by span class
83
- xs = self.parsed_page.xpath("//span[@class='mw-headline']")
49
+ headlines = self.parsed_page.xpath("//span[@class='mw-headline']")
50
+
84
51
  # filter single headline by name (ignore case)
85
- xs = self.filter_headline xs, headline_name unless headline_name.nil?
52
+ headlines = filter_headline(headlines, headline_name) unless headline_name.nil?
86
53
 
87
54
  # NOTE: first_part has no id attribute and thus cannot be filtered or processed within xpath (xs)
88
- if headline_name == self.name || headline_name.nil?
89
- x = self.first_part
90
- result[self.name] ||= []
91
- result[self.name] << (self.collect_elements(x.parent))
55
+ if headline_name.nil? || headline_name.start_with?(name.downcase)
56
+ x = first_part
57
+ result[name] ||= []
58
+ result[name] << (collect_elements(x.parent))
92
59
  end
93
60
 
94
61
  # append all blocks
95
- xs.each do |x|
96
- headline = x.attributes["id"].value
97
- elements = self.collect_elements x.parent.next
98
- result[headline] ||= []
99
- result[headline] << elements
62
+ headlines.each do |headline|
63
+ headline_value = headline.attributes['id'].value
64
+ elements = collect_elements(headline.parent.next)
65
+ result[headline_value] ||= []
66
+ result[headline_value] << elements
100
67
  end
101
68
 
102
- result
69
+ # create root object
70
+ PageHeadline.new(parent: self, name: result.first[0], headlines: result, level: 0)
103
71
  end
104
72
 
105
73
  # harvest first part of the page (missing heading and class="mw-headline")
106
74
  def first_part
107
- self.parsed_page ||= @connect.page self.name
108
- self.parsed_page.search("p").first.children.first
75
+ self.parsed_page ||= @connect.page(name)
76
+ self.parsed_page.search('p').first.children.first
109
77
  end
110
78
 
111
79
  # collect elements within headlines (not nested properties, but next elements)
112
- def collect_elements element
80
+ def collect_elements(element)
113
81
  # capture first element name
114
82
  elements = []
115
83
  # iterate text until next headline
116
- while true do
84
+ loop do
117
85
  elements << element
118
86
  element = element.next
119
- break if element.nil? || element.to_html.include?("class=\"mw-headline\"")
87
+ break if element.nil? || element.to_html.include?('class="mw-headline"')
120
88
  end
121
89
  elements
122
90
  end
123
91
 
124
- def filter_headline xs, headline_name
92
+ def filter_headline(xs, headline_name)
125
93
  # transform name to a wiki_id (downcase and space replace with underscore)
126
- headline_name = headline_name.downcase.gsub(" ", "_")
94
+ headline_name = headline_name.downcase.gsub(' ', '_')
127
95
  # reject not matching id's
128
- xs.reject do |t|
129
- !t.attributes["id"].value.downcase.start_with?(headline_name)
96
+ xs.select do |t|
97
+ t.attributes['id'].value.downcase.start_with?(headline_name)
130
98
  end
131
99
  end
132
-
133
100
  end
134
-
135
101
  end
136
- end
102
+ end