infoboxer 0.1.2.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5a8db6382e2ffef87c2685aa1a6ef9ad37f3b57b
4
- data.tar.gz: c4a622a22d275e1098f6957fad0ec0de13a88001
3
+ metadata.gz: d3081274989109208504796d1357e7ab78dd8981
4
+ data.tar.gz: 255f2ffa01c283fd11cbe1a1b308223d276c3b22
5
5
  SHA512:
6
- metadata.gz: 15945396bc46ea107235d2e1f2c9b496a81b6cc28fe6d6ecfc2c9424ded26cb8cf4a3ec06cc8c63a52b4e0839ee5e7deac941f59784dcb2c1f3769dea468d3d0
7
- data.tar.gz: 0dbce96e10c402c1676a4f5b694862e9fec9cff7d25606b5217ce3e31e64776bfa36ac31ee0c8e7964b32010ad93e9ad4453e605cbd4abcd9eff02350b58f35e
6
+ metadata.gz: 47ff1c7ac1f6e34ba4e5491cd7f5a6e180f18c02c4bf6061d08c6589ca3b66cd8ac1c600cc6e03dda244c3dd37a1986356e47d86f79602416e0eba021182fe00
7
+ data.tar.gz: c40d2bb3f4b2d336830d56e8b8cc2b126807022f409fe41fd63f0d229f139030b4ef9c18116f802016c3e33af6f4aba1f1766c03619d3e185cdce2b949d63bf6
data/CHANGELOG.md CHANGED
@@ -1,5 +1,17 @@
1
1
  # Infoboxer's change log
2
2
 
3
+ ## 0.2.0 (2015-12-21)
4
+
5
+ * MediaWiki backend changed to (our own handcrafted)
6
+ [mediawiktory](https://github.com/molybdenum-99/mediawiktory);
7
+ * Added page lists fetching like `MediaWiki#category(categoryname)`,
8
+ `MediaWiki#search(search_phrase)`;
9
+ * `MediaWiki#get` now can fetch any number of pages at once (it was only
10
+ 50 in previous versions);
11
+ * `bin/infoboxer` console added for quick experimenting;
12
+ * `Template#to_h` added for quick information extraction;
13
+ * many small bugfixes and echancements.
14
+
3
15
  ## 0.1.2.1 (2015-12-04)
4
16
 
5
17
  * Small bug with newlines in templates fixed.
@@ -22,6 +34,6 @@ Basically, preparing for wider release!
22
34
 
23
35
  ## 0.1.0 (2015-08-07)
24
36
 
25
- Initial (ok, I know it's typically called 0.1.1, but here's work of
37
+ Initial (ok, I know it's typically called 0.0.1, but here's work of
26
38
  three monthes, numerous documentations and examples and so on... so, let
27
39
  it be 0.1.0).
data/README.md CHANGED
@@ -4,6 +4,7 @@
4
4
  [![Build Status](https://travis-ci.org/molybdenum-99/infoboxer.svg?branch=master)](https://travis-ci.org/molybdenum-99/infoboxer)
5
5
  [![Coverage Status](https://coveralls.io/repos/molybdenum-99/infoboxer/badge.svg?branch=master&service=github)](https://coveralls.io/github/molybdenum-99/infoboxer?branch=master)
6
6
  [![Code Climate](https://codeclimate.com/github/molybdenum-99/infoboxer/badges/gpa.svg)](https://codeclimate.com/github/molybdenum-99/infoboxer)
7
+ [![Molybdenum-99 Gitter](https://badges.gitter.im/molybdenum-99.png)](https://gitter.im/molybdenum-99)
7
8
 
8
9
  **Infoboxer** is pure-Ruby Wikipedia (and generic MediaWiki) client and
9
10
  parser, targeting information extraction (hence the name).
@@ -97,6 +98,25 @@ See [Navigation shortcuts](https://github.com/molybdenum-99/infoboxer/wiki/Navig
97
98
 
98
99
  To put it all in one piece, also take a look at [Data extraction tips and tricks](https://github.com/molybdenum-99/infoboxer/wiki/Tips-and-tricks).
99
100
 
101
+ ### infoboxer executable
102
+
103
+ Just try `infoboxer` command.
104
+
105
+ Without any options, it starts IRB session with infoboxer required and
106
+ included into main namespace.
107
+
108
+ With `-w` option, it provides a shortcut to MediaWiki instance you want.
109
+ Like this:
110
+
111
+ ```
112
+ $ infoboxer -w https://en.wikipedia.org/w/api.php
113
+ > get('Argentina')
114
+ => #<Page(title: "Argentina", url: "https://en.wikipedia.org/wiki/Argentina"): ....
115
+ ```
116
+
117
+ You can also use shortcuts like `infoboxer -w wikipedia` for common
118
+ wikies (and, just for fun, `infoboxer -wikipedia` also).
119
+
100
120
  ## Advanced topics
101
121
 
102
122
  * [Reasons](https://github.com/molybdenum-99/infoboxer/wiki/Reasons) for
@@ -114,9 +134,10 @@ To put it all in one piece, also take a look at [Data extraction tips and tricks
114
134
 
115
135
  ## Compatibility
116
136
 
117
- As of now, Infoboxer reported to be compatible with any MRI Ruby since 1.9.3.
118
- In Travis-CI tests, JRuby is failing due to bug in old Java 7/Java 8 SSL
119
- certificate support ([see here](https://github.com/jruby/jruby/issues/2599)),
137
+ As of now, Infoboxer reported to be compatible with any MRI Ruby since 2.0.0
138
+ (1.9.3 previously, dropped since Infoboxer 0.2.0). In Travis-CI tests,
139
+ JRuby is failing due to bug in old Java 7/Java 8 SSL certificate support
140
+ ([see here](https://github.com/jruby/jruby/issues/2599)),
120
141
  and Rubinius failing 3 specs of 500 by mystery, which is uninvestigated yet.
121
142
 
122
143
  Therefore, those Ruby versions are excluded from Travis config, though,
@@ -129,10 +150,10 @@ they may still work for you.
129
150
  * **NB**: ↑ this is "current version" link, but RubyDoc.info unfortunately
130
151
  sometimes fails to update it to really _current_; in case you feel
131
152
  something seriously underdocumented, please-please look at
132
- [0.1.2 docs](http://www.rubydoc.info/gems/infoboxer/0.1.2).
153
+ [0.2.0 docs](http://www.rubydoc.info/gems/infoboxer/0.2.0).
133
154
  * [Contributing](https://github.com/molybdenum-99/infoboxer/wiki/Contributing)
134
155
  * [Roadmap](https://github.com/molybdenum-99/infoboxer/wiki/Roadmap)
135
156
 
136
157
  ## License
137
158
 
138
- MIT.
159
+ [MIT](https://github.com/molybdenum-99/infoboxer/blob/master/LICENSE.txt).
data/bin/infoboxer ADDED
@@ -0,0 +1,45 @@
1
+ #!/usr/bin/env ruby
2
+ require 'rubygems'
3
+ require 'bundler/setup'
4
+ require 'infoboxer'
5
+
6
+ include Infoboxer
7
+
8
+ require 'optparse'
9
+
10
+ wiki_url = nil
11
+
12
+ OptionParser.new do |opts|
13
+ opts.banner = "Usage: bin/infoboxer [-w wiki_api_url]"
14
+
15
+ opts.on("-w", "--wiki WIKI_API_URL",
16
+ "Make wiki by WIKI_API_URL a default wiki, and use it with just get('Pagename')") do |w|
17
+ wiki_url = w
18
+ end
19
+ end.parse!
20
+
21
+ if wiki_url
22
+ if wiki_url =~ /^[a-z]+$/
23
+ wiki_url = case
24
+ when domain = Infoboxer::WIKIMEDIA_PROJECTS[wiki_url.to_sym]
25
+ "https://en.#{domain}/w/api.php"
26
+ when domain = Infoboxer::WIKIMEDIA_PROJECTS[('w' + wiki_url).to_sym]
27
+ "https://en.#{domain}/w/api.php"
28
+ else
29
+ fail("Unidentified wiki: #{wiki_url}")
30
+ end
31
+ end
32
+
33
+ DEFAULT_WIKI = Infoboxer.wiki(wiki_url)
34
+ puts "Default Wiki selected: #{wiki_url}.\nNow you can use `get('Pagename')`, `category('Categoryname')` and so on.\n\n"
35
+ [:raw, :get, :category, :search, :prefixsearch].each do |m|
36
+ define_method(m){|*arg|
37
+ DEFAULT_WIKI.send(m, *arg)
38
+ }
39
+ end
40
+ end
41
+
42
+ require 'irb'
43
+ ARGV.shift until ARGV.empty?
44
+ IRB.start
45
+
data/infoboxer.gemspec CHANGED
@@ -29,7 +29,7 @@ Gem::Specification.new do |s|
29
29
 
30
30
  s.add_dependency 'htmlentities'
31
31
  s.add_dependency 'procme'
32
- s.add_dependency 'rest-client'
32
+ s.add_dependency 'mediawiktory', '>= 0.0.2'
33
33
  s.add_dependency 'addressable'
34
34
  s.add_dependency 'terminal-table'
35
35
  s.add_dependency 'backports'
@@ -24,7 +24,6 @@ module Infoboxer
24
24
  '!((' => '[[',
25
25
  '!-' => '|-',
26
26
  '!:' => ':',
27
- '&' => '&',
28
27
  "'" => " '",
29
28
  "''" => '″',
30
29
  "'s" => "'‍s",
@@ -7,15 +7,19 @@ module Infoboxer
7
7
  # Alongside with document tree structure, knows document's title as
8
8
  # represented by MediaWiki and human (non-API) URL.
9
9
  class Page < Tree::Document
10
- def initialize(client, children, raw)
11
- @client = client
12
- super(children, raw)
10
+ def initialize(client, children, source)
11
+ @client, @source = client, source
12
+ super(children, title: source.title, url: source.fullurl)
13
13
  end
14
14
 
15
15
  # Instance of {MediaWiki} which this page was received from
16
16
  # @return {MediaWiki}
17
17
  attr_reader :client
18
18
 
19
+ # Instance of MediaWiktory::Page class with source data
20
+ # @return {MediaWiktory::Page}
21
+ attr_reader :source
22
+
19
23
  # @!attribute [r] title
20
24
  # Page title.
21
25
  # @return [String]
@@ -24,11 +28,15 @@ module Infoboxer
24
28
  # Page friendly URL.
25
29
  # @return [String]
26
30
 
27
- def_readers :title, :url, :traits
31
+ def_readers :title, :url
32
+
33
+ def traits
34
+ client.traits
35
+ end
28
36
 
29
37
  private
30
38
 
31
- PARAMS_TO_INSPECT = [:url, :title, :domain]
39
+ PARAMS_TO_INSPECT = [:url, :title] #, :domain]
32
40
 
33
41
  def show_params
34
42
  super(params.select{|k, v| PARAMS_TO_INSPECT.include?(k)})
@@ -68,14 +68,14 @@ module Infoboxer
68
68
 
69
69
  def initialize(options = {})
70
70
  @options = options
71
- @file_prefix = [DEFAULTS[:file_prefix], options.delete(:file_prefix)].
71
+ @file_namespace = [DEFAULTS[:file_namespace], namespace_aliases(options, 'File')].
72
72
  flatten.compact.uniq
73
- @category_prefix = [DEFAULTS[:category_prefix], options.delete(:category_prefix)].
73
+ @category_namespace = [DEFAULTS[:category_namespace], namespace_aliases(options, 'Category')].
74
74
  flatten.compact.uniq
75
75
  end
76
76
 
77
77
  # @private
78
- attr_reader :file_prefix, :category_prefix
78
+ attr_reader :file_namespace, :category_namespace
79
79
 
80
80
  # @private
81
81
  def templates
@@ -84,9 +84,15 @@ module Infoboxer
84
84
 
85
85
  private
86
86
 
87
+ def namespace_aliases(options, canonical)
88
+ namespace = (options[:namespaces] || []).detect{|v| v.canonical == canonical}
89
+ return nil unless namespace
90
+ [namespace['*'], *namespace.aliases]
91
+ end
92
+
87
93
  DEFAULTS = {
88
- file_prefix: 'File',
89
- category_prefix: 'Category'
94
+ file_namespace: 'File',
95
+ category_namespace: 'Category'
90
96
  }
91
97
 
92
98
  end
@@ -1,6 +1,7 @@
1
1
  # encoding: utf-8
2
- require 'rest-client'
3
- require 'json'
2
+ #require 'rest-client'
3
+ #require 'json'
4
+ require 'mediawiktory'
4
5
  require 'addressable/uri'
5
6
 
6
7
  require_relative 'media_wiki/traits'
@@ -36,7 +37,7 @@ module Infoboxer
36
37
  attr_accessor :user_agent
37
38
  end
38
39
 
39
- attr_reader :api_base_url
40
+ attr_reader :api_base_url, :traits
40
41
 
41
42
  # Creating new MediaWiki client. {Infoboxer.wiki} provides shortcut
42
43
  # for it, as well as shortcuts for some well-known wikis, like
@@ -49,7 +50,8 @@ module Infoboxer
49
50
  # * `:user_agent` (also aliased as `:ua`) -- custom User-Agent header.
50
51
  def initialize(api_base_url, options = {})
51
52
  @api_base_url = Addressable::URI.parse(api_base_url)
52
- @resource = RestClient::Resource.new(api_base_url, headers: headers(options))
53
+ @client = MediaWiktory::Client.new(api_base_url, user_agent: user_agent(options))
54
+ @traits = Traits.get(@api_base_url.host, namespaces: extract_namespaces)
53
55
  end
54
56
 
55
57
  # Receive "raw" data from Wikipedia (without parsing or wrapping in
@@ -57,18 +59,22 @@ module Infoboxer
57
59
  #
58
60
  # @return [Array<Hash>]
59
61
  def raw(*titles)
60
- postprocess @resource.get(
61
- params: DEFAULT_PARAMS.merge(titles: titles.join('|'))
62
- )
62
+ titles.each_slice(50).map{|part|
63
+ @client.query.
64
+ titles(*part).
65
+ prop(revisions: {prop: :content}, info: {prop: :url}).
66
+ redirects(true). # FIXME: should be done transparently by MediaWiktory?
67
+ perform.pages
68
+ }.inject(:concat) # somehow flatten(1) fails!
63
69
  end
64
70
 
65
- # Receive list of parsed wikipedia pages for list of titles provided.
71
+ # Receive list of parsed MediaWiki pages for list of titles provided.
66
72
  # All pages are received with single query to MediaWiki API.
67
73
  #
68
- # **NB**: currently, if you are requesting more than 50 titles at
69
- # once (MediaWiki limitation for single request), Infoboxer will
70
- # **not** try to get other pages with subsequent queries. This will
71
- # be fixed in future.
74
+ # **NB**: if you are requesting more than 50 titles at once
75
+ # (MediaWiki limitation for single request), Infoboxer will do as
76
+ # many queries as necessary to extract them all (it will be like
77
+ # `(titles.count / 50.0).ceil` requests)
72
78
  #
73
79
  # @return [Tree::Nodes<Page>] array of parsed pages. Notes:
74
80
  # * if you call `get` with only one title, one page will be
@@ -87,76 +93,118 @@ module Infoboxer
87
93
  # NotFound.
88
94
  #
89
95
  def get(*titles)
90
- pages = raw(*titles).reject{|raw| raw[:content].nil?}.
96
+ pages = raw(*titles).
97
+ tap{|pages| pages.detect(&:invalid?).tap{|i| i && fail(i.raw.invalidreason)}}.
98
+ select(&:exists?).
91
99
  map{|raw|
92
- traits = Traits.get(@api_base_url.host, extract_traits(raw))
93
-
94
100
  Page.new(self,
95
- Parser.paragraphs(raw[:content], traits),
96
- raw.merge(traits: traits))
101
+ Parser.paragraphs(raw.content, traits),
102
+ raw)
97
103
  }
98
104
  titles.count == 1 ? pages.first : Tree::Nodes[*pages]
99
105
  end
100
106
 
101
- private
107
+ # Receive list of parsed MediaWiki pages from specified category.
108
+ #
109
+ # **NB**: currently, this API **always** fetches all pages from
110
+ # category, there is no option to "take first 20 pages". Pages are
111
+ # fetched in 50-page batches, then parsed. So, for large category
112
+ # it can really take a while to fetch all pages.
113
+ #
114
+ # @param title Category title. You can use namespaceless title (like
115
+ # `"Countries in South America"`), title with namespace (like
116
+ # `"Category:Countries in South America"`) or title with local
117
+ # namespace (like `"Catégorie:Argentine"` for French Wikipedia)
118
+ #
119
+ # @return [Tree::Nodes<Page>] array of parsed pages.
120
+ #
121
+ def category(title)
122
+ title = normalize_category_title(title)
123
+
124
+ list(categorymembers: {title: title, limit: 50})
125
+ end
102
126
 
103
- # @private
104
- PROP = [
105
- 'revisions', # to extract content of the page
106
- 'info', # to extract page canonical url
107
- 'categories', # to extract default category prefix
108
- 'images' # to extract default media prefix
109
- ].join('|')
110
-
111
- # @private
112
- DEFAULT_PARAMS = {
113
- action: :query,
114
- format: :json,
115
- redirects: true,
116
-
117
- prop: PROP,
118
- rvprop: :content,
119
- inprop: :url,
120
- }
121
-
122
- def headers(options)
123
- {'User-Agent' => options[:user_agent] || options[:ua] || self.class.user_agent || UA}
127
+ # Receive list of parsed MediaWiki pages for provided search query.
128
+ # See [MediaWiki API docs](https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bsearch)
129
+ # for details.
130
+ #
131
+ # **NB**: currently, this API **always** fetches all pages from
132
+ # category, there is no option to "take first 20 pages". Pages are
133
+ # fetched in 50-page batches, then parsed. So, for large category
134
+ # it can really take a while to fetch all pages.
135
+ #
136
+ # @param query Search query. For old installations, look at
137
+ # https://www.mediawiki.org/wiki/Help:Searching
138
+ # for search syntax. For new ones (including Wikipedia), see at
139
+ # https://www.mediawiki.org/wiki/Help:CirrusSearch.
140
+ #
141
+ # @return [Tree::Nodes<Page>] array of parsed pages.
142
+ #
143
+ def search(query)
144
+ list(search: {search: query, limit: 50})
145
+ end
146
+
147
+ # Receive list of parsed MediaWiki pages with titles startin from prefix.
148
+ # See [MediaWiki API docs](https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bprefixsearch)
149
+ # for details.
150
+ #
151
+ # **NB**: currently, this API **always** fetches all pages from
152
+ # category, there is no option to "take first 20 pages". Pages are
153
+ # fetched in 50-page batches, then parsed. So, for large category
154
+ # it can really take a while to fetch all pages.
155
+ #
156
+ # @param prefix page title prefix.
157
+ #
158
+ # @return [Tree::Nodes<Page>] array of parsed pages.
159
+ #
160
+ def prefixsearch(prefix)
161
+ list(prefixsearch: {search: prefix, limit: 100})
124
162
  end
125
163
 
126
- def extract_traits(raw)
127
- raw.select{|k, v| [:file_prefix, :category_prefix].include?(k)}
164
+ def inspect
165
+ "#<#{self.class}(#{@api_base_url.host})>"
128
166
  end
129
167
 
130
- def guess_traits(pages)
131
- categories = pages.map{|p| p['categories']}.compact.flatten
132
- images = pages.map{|p| p['images']}.compact.flatten
133
- {
134
- file_prefix: images.map{|i| i['title'].scan(/^([^:]+):/)}.flatten.uniq,
135
- category_prefix: categories.map{|i| i['title'].scan(/^([^:]+):/)}.flatten.uniq,
136
- }
168
+ private
169
+
170
+ def list(query)
171
+ response = @client.query.
172
+ generator(query).
173
+ prop(revisions: {prop: :content}, info: {prop: :url}).
174
+ redirects(true). # FIXME: should be done transparently by MediaWiktory?
175
+ perform
176
+
177
+ response.continue! while response.continue?
178
+
179
+ pages = response.pages.select(&:exists?).
180
+ map{|raw|
181
+ Page.new(self,
182
+ Parser.paragraphs(raw.content, traits),
183
+ raw)
184
+ }
185
+
186
+ Tree::Nodes[*pages]
137
187
  end
138
188
 
139
- def postprocess(response)
140
- pages = JSON.parse(response)['query']['pages']
141
- traits = guess_traits(pages.values)
189
+ def normalize_category_title(title)
190
+ # FIXME: shouldn't it go to MediaWiktory?..
191
+ namespace, titl = title.include?(':') ? title.split(':', 2) : [nil, title]
192
+ namespace, titl = nil, title unless traits.category_namespace.include?(namespace)
142
193
 
143
- pages.map{|id, data|
144
- if id.to_i < 0
145
- {
146
- title: data['title'],
147
- content: nil,
148
- not_found: true
149
- }
150
- else
151
- {
152
- title: data['title'],
153
- content: data['revisions'].first['*'],
154
- url: data['fullurl'],
155
- }.merge(traits)
156
- end
194
+ namespace ||= traits.category_namespace.first
195
+ [namespace, titl].join(':')
196
+ end
197
+
198
+ def user_agent(options)
199
+ options[:user_agent] || options[:ua] || self.class.user_agent || UA
200
+ end
201
+
202
+ def extract_namespaces
203
+ siteinfo = @client.query.meta(siteinfo: {prop: [:namespaces, :namespacealiases]}).perform
204
+ siteinfo.raw.query.namespaces.map{|_, namespace|
205
+ aliases = siteinfo.raw.query.namespacealiases.select{|a| a.id == namespace.id}.map{|a| a['*']}
206
+ namespace.merge(aliases: aliases)
157
207
  }
158
- rescue JSON::ParserError
159
- fail RuntimeError, "Not a JSON response, seems there's not a MediaWiki API: #{@api_base_url}"
160
208
  end
161
209
  end
162
210
  end
@@ -118,7 +118,7 @@ module Infoboxer
118
118
  #
119
119
  # @return {Tree::Nodes}
120
120
  def categories
121
- lookup(Tree::Wikilink, namespace: /^#{ensure_traits.category_prefix.join('|')}$/)
121
+ lookup(Tree::Wikilink, namespace: /^#{ensure_traits.category_namespace.join('|')}$/)
122
122
  end
123
123
 
124
124
  # As users accustomed to have only one infobox on a page
@@ -1,4 +1,6 @@
1
1
  # encoding: utf-8
2
+ require 'strscan'
3
+
2
4
  module Infoboxer
3
5
  class Parser
4
6
  class Context
@@ -86,11 +88,23 @@ module Infoboxer
86
88
  res
87
89
  end
88
90
 
91
+ def push_eol_sign(re)
92
+ @inline_eol_sign = re
93
+ end
94
+
95
+ def pop_eol_sign
96
+ @inline_eol_sign = nil
97
+ end
98
+
99
+ attr_reader :inline_eol_sign
100
+
89
101
  def inline_eol?(exclude = nil)
90
102
  # not using StringScanner#check, as it will change #matched value
91
103
  eol? ||
92
- (current =~ %r[^(</ref>|}})] &&
93
- (!exclude || $1 !~ exclude)) # FIXME: ugly, but no idea of prettier solution
104
+ (
105
+ (current =~ %r[^(</ref>|}})] || @inline_eol_sign && current =~ @inline_eol_sign) &&
106
+ (!exclude || $1 !~ exclude)
107
+ ) # FIXME: ugly, but no idea of prettier solution
94
108
  end
95
109
 
96
110
  def scan_continued_until(re, leave_pattern = false)
@@ -5,7 +5,7 @@ module Infoboxer
5
5
  include Tree
6
6
 
7
7
  def image
8
- @context.skip(re.file_prefix) or
8
+ @context.skip(re.file_namespace) or
9
9
  @context.fail!("Something went wrong: it's not image?")
10
10
 
11
11
  path = @context.scan_until(/\||\]\]/)
@@ -32,7 +32,12 @@ module Infoboxer
32
32
  def short_inline(until_pattern = nil)
33
33
  nodes = Nodes[]
34
34
  guarded_loop do
35
- chunk = @context.scan_until(re.short_inline_until_cache[until_pattern])
35
+ # FIXME: quick and UGLY IS HELL JUST TRYING TO MAKE THE SHIT WORK
36
+ if @context.inline_eol_sign
37
+ chunk = @context.scan_until(re.short_inline_until_cache_brackets[until_pattern])
38
+ else
39
+ chunk = @context.scan_until(re.short_inline_until_cache[until_pattern])
40
+ end
36
41
  nodes << chunk
37
42
 
38
43
  break if @context.matched_inline?(until_pattern)
@@ -82,7 +87,7 @@ module Infoboxer
82
87
  when "''"
83
88
  Italic.new(short_inline(/''/))
84
89
  when '[['
85
- if @context.check(re.file_prefix)
90
+ if @context.check(re.file_namespace)
86
91
  image
87
92
  else
88
93
  wikilink
@@ -118,7 +123,11 @@ module Infoboxer
118
123
  # [http://www.example.org link name]
119
124
  def external_link(protocol)
120
125
  link = @context.scan_continued_until(/\s+|\]/)
121
- caption = inline(/\]/) if @context.matched =~ /\s+/
126
+ if @context.matched =~ /\s+/
127
+ @context.push_eol_sign(/^\]/)
128
+ caption = short_inline(/\]/)
129
+ @context.pop_eol_sign
130
+ end
122
131
  ExternalLink.new(protocol + link, caption)
123
132
  end
124
133
 
@@ -4,8 +4,8 @@ module Infoboxer
4
4
  module Template
5
5
  include Tree
6
6
 
7
- # NB: here we are not distingish templates like {{Infobox|variable}}
8
- # and "magic words" like {{formatnum:123}}
7
+ # NB: here we are not distingish templates like `{{Infobox|variable}}`
8
+ # and "magic words" like `{{formatnum:123}}`
9
9
  # Just calling all of them "templates". This behaviour will change
10
10
  # in future, I presume
11
11
  # More about magic words: https://www.mediawiki.org/wiki/Help:Magic_words
@@ -29,6 +29,7 @@ module Infoboxer
29
29
  @context.skip(/\s*=\s*/)
30
30
  else
31
31
  name = num
32
+ num += 1
32
33
  end
33
34
 
34
35
  value = long_inline(/\||}}/)
@@ -38,8 +39,6 @@ module Infoboxer
38
39
 
39
40
  break if @context.eat_matched?('}}')
40
41
  @context.eof? and @context.fail!("Unexpected break of template variables: #{res}")
41
-
42
- num += 1
43
42
  end
44
43
  res
45
44
  end
@@ -16,20 +16,31 @@ module Infoboxer
16
16
 
17
17
  INLINE_EOL = %r[(?= # if we have ahead... (not scanned, just checked
18
18
  </ref> | # <ref> closed
19
- }} # or template closed
19
+ }}
20
+ )]x
21
+
22
+ INLINE_EOL_BR = %r[(?= # if we have ahead... (not scanned, just checked
23
+ </ref> | # <ref> closed
24
+ }} | # or template closed
25
+ (?<!\])\](?!\]) # or ext.link closed,
26
+ # the madness with look-ahead/behind means "match single bracket but not double"
20
27
  )]x
21
28
 
22
29
 
23
30
  def make_regexps
24
31
  {
25
- file_prefix: /(#{@context.traits.file_prefix.join('|')}):/,
32
+ file_namespace: /(#{@context.traits.file_namespace.join('|')}):/,
26
33
  formatting: FORMATTING,
27
34
  inline_until_cache: Hash.new{|h, r|
28
35
  h[r] = Regexp.union(*[r, FORMATTING, /$/].compact.uniq)
29
36
  },
30
37
  short_inline_until_cache: Hash.new{|h, r|
31
38
  h[r] = Regexp.union(*[r, INLINE_EOL, FORMATTING, /$/].compact.uniq)
39
+ },
40
+ short_inline_until_cache_brackets: Hash.new{|h, r|
41
+ h[r] = Regexp.union(*[r, INLINE_EOL_BR, FORMATTING, /$/].compact.uniq)
32
42
  }
43
+
33
44
  }
34
45
  end
35
46
 
@@ -46,7 +57,7 @@ module Infoboxer
46
57
  scan.skip(/=\s*/)
47
58
  q = scan.scan(/['"]/)
48
59
  if q
49
- value = scan.scan_until(/#{q}/).sub(q, '')
60
+ value = scan.scan_until(/#{q}|$/).sub(q, '')
50
61
  else
51
62
  value = scan.scan_until(/\s|$/)
52
63
  end
@@ -43,7 +43,7 @@ module Infoboxer
43
43
  super(level) +
44
44
  if caption && !caption.empty?
45
45
  indent(level+1) + "caption:\n" +
46
- caption.map(&call(to_tree: level+2)).join
46
+ caption.children.map(&call(to_tree: level+2)).join
47
47
  else
48
48
  ''
49
49
  end