infoboxer 0.1.2.1 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5a8db6382e2ffef87c2685aa1a6ef9ad37f3b57b
4
- data.tar.gz: c4a622a22d275e1098f6957fad0ec0de13a88001
3
+ metadata.gz: d3081274989109208504796d1357e7ab78dd8981
4
+ data.tar.gz: 255f2ffa01c283fd11cbe1a1b308223d276c3b22
5
5
  SHA512:
6
- metadata.gz: 15945396bc46ea107235d2e1f2c9b496a81b6cc28fe6d6ecfc2c9424ded26cb8cf4a3ec06cc8c63a52b4e0839ee5e7deac941f59784dcb2c1f3769dea468d3d0
7
- data.tar.gz: 0dbce96e10c402c1676a4f5b694862e9fec9cff7d25606b5217ce3e31e64776bfa36ac31ee0c8e7964b32010ad93e9ad4453e605cbd4abcd9eff02350b58f35e
6
+ metadata.gz: 47ff1c7ac1f6e34ba4e5491cd7f5a6e180f18c02c4bf6061d08c6589ca3b66cd8ac1c600cc6e03dda244c3dd37a1986356e47d86f79602416e0eba021182fe00
7
+ data.tar.gz: c40d2bb3f4b2d336830d56e8b8cc2b126807022f409fe41fd63f0d229f139030b4ef9c18116f802016c3e33af6f4aba1f1766c03619d3e185cdce2b949d63bf6
data/CHANGELOG.md CHANGED
@@ -1,5 +1,17 @@
1
1
  # Infoboxer's change log
2
2
 
3
+ ## 0.2.0 (2015-12-21)
4
+
5
+ * MediaWiki backend changed to (our own handcrafted)
6
+ [mediawiktory](https://github.com/molybdenum-99/mediawiktory);
7
+ * Added page lists fetching like `MediaWiki#category(categoryname)`,
8
+ `MediaWiki#search(search_phrase)`;
9
+ * `MediaWiki#get` now can fetch any number of pages at once (it was only
10
+ 50 in previous versions);
11
+ * `bin/infoboxer` console added for quick experimenting;
12
+ * `Template#to_h` added for quick information extraction;
13
+ * many small bugfixes and echancements.
14
+
3
15
  ## 0.1.2.1 (2015-12-04)
4
16
 
5
17
  * Small bug with newlines in templates fixed.
@@ -22,6 +34,6 @@ Basically, preparing for wider release!
22
34
 
23
35
  ## 0.1.0 (2015-08-07)
24
36
 
25
- Initial (ok, I know it's typically called 0.1.1, but here's work of
37
+ Initial (ok, I know it's typically called 0.0.1, but here's work of
26
38
  three monthes, numerous documentations and examples and so on... so, let
27
39
  it be 0.1.0).
data/README.md CHANGED
@@ -4,6 +4,7 @@
4
4
  [![Build Status](https://travis-ci.org/molybdenum-99/infoboxer.svg?branch=master)](https://travis-ci.org/molybdenum-99/infoboxer)
5
5
  [![Coverage Status](https://coveralls.io/repos/molybdenum-99/infoboxer/badge.svg?branch=master&service=github)](https://coveralls.io/github/molybdenum-99/infoboxer?branch=master)
6
6
  [![Code Climate](https://codeclimate.com/github/molybdenum-99/infoboxer/badges/gpa.svg)](https://codeclimate.com/github/molybdenum-99/infoboxer)
7
+ [![Molybdenum-99 Gitter](https://badges.gitter.im/molybdenum-99.png)](https://gitter.im/molybdenum-99)
7
8
 
8
9
  **Infoboxer** is pure-Ruby Wikipedia (and generic MediaWiki) client and
9
10
  parser, targeting information extraction (hence the name).
@@ -97,6 +98,25 @@ See [Navigation shortcuts](https://github.com/molybdenum-99/infoboxer/wiki/Navig
97
98
 
98
99
  To put it all in one piece, also take a look at [Data extraction tips and tricks](https://github.com/molybdenum-99/infoboxer/wiki/Tips-and-tricks).
99
100
 
101
+ ### infoboxer executable
102
+
103
+ Just try `infoboxer` command.
104
+
105
+ Without any options, it starts IRB session with infoboxer required and
106
+ included into main namespace.
107
+
108
+ With `-w` option, it provides a shortcut to MediaWiki instance you want.
109
+ Like this:
110
+
111
+ ```
112
+ $ infoboxer -w https://en.wikipedia.org/w/api.php
113
+ > get('Argentina')
114
+ => #<Page(title: "Argentina", url: "https://en.wikipedia.org/wiki/Argentina"): ....
115
+ ```
116
+
117
+ You can also use shortcuts like `infoboxer -w wikipedia` for common
118
+ wikies (and, just for fun, `infoboxer -wikipedia` also).
119
+
100
120
  ## Advanced topics
101
121
 
102
122
  * [Reasons](https://github.com/molybdenum-99/infoboxer/wiki/Reasons) for
@@ -114,9 +134,10 @@ To put it all in one piece, also take a look at [Data extraction tips and tricks
114
134
 
115
135
  ## Compatibility
116
136
 
117
- As of now, Infoboxer reported to be compatible with any MRI Ruby since 1.9.3.
118
- In Travis-CI tests, JRuby is failing due to bug in old Java 7/Java 8 SSL
119
- certificate support ([see here](https://github.com/jruby/jruby/issues/2599)),
137
+ As of now, Infoboxer reported to be compatible with any MRI Ruby since 2.0.0
138
+ (1.9.3 previously, dropped since Infoboxer 0.2.0). In Travis-CI tests,
139
+ JRuby is failing due to bug in old Java 7/Java 8 SSL certificate support
140
+ ([see here](https://github.com/jruby/jruby/issues/2599)),
120
141
  and Rubinius failing 3 specs of 500 by mystery, which is uninvestigated yet.
121
142
 
122
143
  Therefore, those Ruby versions are excluded from Travis config, though,
@@ -129,10 +150,10 @@ they may still work for you.
129
150
  * **NB**: ↑ this is "current version" link, but RubyDoc.info unfortunately
130
151
  sometimes fails to update it to really _current_; in case you feel
131
152
  something seriously underdocumented, please-please look at
132
- [0.1.2 docs](http://www.rubydoc.info/gems/infoboxer/0.1.2).
153
+ [0.2.0 docs](http://www.rubydoc.info/gems/infoboxer/0.2.0).
133
154
  * [Contributing](https://github.com/molybdenum-99/infoboxer/wiki/Contributing)
134
155
  * [Roadmap](https://github.com/molybdenum-99/infoboxer/wiki/Roadmap)
135
156
 
136
157
  ## License
137
158
 
138
- MIT.
159
+ [MIT](https://github.com/molybdenum-99/infoboxer/blob/master/LICENSE.txt).
data/bin/infoboxer ADDED
@@ -0,0 +1,45 @@
1
+ #!/usr/bin/env ruby
2
+ require 'rubygems'
3
+ require 'bundler/setup'
4
+ require 'infoboxer'
5
+
6
+ include Infoboxer
7
+
8
+ require 'optparse'
9
+
10
+ wiki_url = nil
11
+
12
+ OptionParser.new do |opts|
13
+ opts.banner = "Usage: bin/infoboxer [-w wiki_api_url]"
14
+
15
+ opts.on("-w", "--wiki WIKI_API_URL",
16
+ "Make wiki by WIKI_API_URL a default wiki, and use it with just get('Pagename')") do |w|
17
+ wiki_url = w
18
+ end
19
+ end.parse!
20
+
21
+ if wiki_url
22
+ if wiki_url =~ /^[a-z]+$/
23
+ wiki_url = case
24
+ when domain = Infoboxer::WIKIMEDIA_PROJECTS[wiki_url.to_sym]
25
+ "https://en.#{domain}/w/api.php"
26
+ when domain = Infoboxer::WIKIMEDIA_PROJECTS[('w' + wiki_url).to_sym]
27
+ "https://en.#{domain}/w/api.php"
28
+ else
29
+ fail("Unidentified wiki: #{wiki_url}")
30
+ end
31
+ end
32
+
33
+ DEFAULT_WIKI = Infoboxer.wiki(wiki_url)
34
+ puts "Default Wiki selected: #{wiki_url}.\nNow you can use `get('Pagename')`, `category('Categoryname')` and so on.\n\n"
35
+ [:raw, :get, :category, :search, :prefixsearch].each do |m|
36
+ define_method(m){|*arg|
37
+ DEFAULT_WIKI.send(m, *arg)
38
+ }
39
+ end
40
+ end
41
+
42
+ require 'irb'
43
+ ARGV.shift until ARGV.empty?
44
+ IRB.start
45
+
data/infoboxer.gemspec CHANGED
@@ -29,7 +29,7 @@ Gem::Specification.new do |s|
29
29
 
30
30
  s.add_dependency 'htmlentities'
31
31
  s.add_dependency 'procme'
32
- s.add_dependency 'rest-client'
32
+ s.add_dependency 'mediawiktory', '>= 0.0.2'
33
33
  s.add_dependency 'addressable'
34
34
  s.add_dependency 'terminal-table'
35
35
  s.add_dependency 'backports'
@@ -24,7 +24,6 @@ module Infoboxer
24
24
  '!((' => '[[',
25
25
  '!-' => '|-',
26
26
  '!:' => ':',
27
- '&' => '&',
28
27
  "'" => " '",
29
28
  "''" => '″',
30
29
  "'s" => "'‍s",
@@ -7,15 +7,19 @@ module Infoboxer
7
7
  # Alongside with document tree structure, knows document's title as
8
8
  # represented by MediaWiki and human (non-API) URL.
9
9
  class Page < Tree::Document
10
- def initialize(client, children, raw)
11
- @client = client
12
- super(children, raw)
10
+ def initialize(client, children, source)
11
+ @client, @source = client, source
12
+ super(children, title: source.title, url: source.fullurl)
13
13
  end
14
14
 
15
15
  # Instance of {MediaWiki} which this page was received from
16
16
  # @return {MediaWiki}
17
17
  attr_reader :client
18
18
 
19
+ # Instance of MediaWiktory::Page class with source data
20
+ # @return {MediaWiktory::Page}
21
+ attr_reader :source
22
+
19
23
  # @!attribute [r] title
20
24
  # Page title.
21
25
  # @return [String]
@@ -24,11 +28,15 @@ module Infoboxer
24
28
  # Page friendly URL.
25
29
  # @return [String]
26
30
 
27
- def_readers :title, :url, :traits
31
+ def_readers :title, :url
32
+
33
+ def traits
34
+ client.traits
35
+ end
28
36
 
29
37
  private
30
38
 
31
- PARAMS_TO_INSPECT = [:url, :title, :domain]
39
+ PARAMS_TO_INSPECT = [:url, :title] #, :domain]
32
40
 
33
41
  def show_params
34
42
  super(params.select{|k, v| PARAMS_TO_INSPECT.include?(k)})
@@ -68,14 +68,14 @@ module Infoboxer
68
68
 
69
69
  def initialize(options = {})
70
70
  @options = options
71
- @file_prefix = [DEFAULTS[:file_prefix], options.delete(:file_prefix)].
71
+ @file_namespace = [DEFAULTS[:file_namespace], namespace_aliases(options, 'File')].
72
72
  flatten.compact.uniq
73
- @category_prefix = [DEFAULTS[:category_prefix], options.delete(:category_prefix)].
73
+ @category_namespace = [DEFAULTS[:category_namespace], namespace_aliases(options, 'Category')].
74
74
  flatten.compact.uniq
75
75
  end
76
76
 
77
77
  # @private
78
- attr_reader :file_prefix, :category_prefix
78
+ attr_reader :file_namespace, :category_namespace
79
79
 
80
80
  # @private
81
81
  def templates
@@ -84,9 +84,15 @@ module Infoboxer
84
84
 
85
85
  private
86
86
 
87
+ def namespace_aliases(options, canonical)
88
+ namespace = (options[:namespaces] || []).detect{|v| v.canonical == canonical}
89
+ return nil unless namespace
90
+ [namespace['*'], *namespace.aliases]
91
+ end
92
+
87
93
  DEFAULTS = {
88
- file_prefix: 'File',
89
- category_prefix: 'Category'
94
+ file_namespace: 'File',
95
+ category_namespace: 'Category'
90
96
  }
91
97
 
92
98
  end
@@ -1,6 +1,7 @@
1
1
  # encoding: utf-8
2
- require 'rest-client'
3
- require 'json'
2
+ #require 'rest-client'
3
+ #require 'json'
4
+ require 'mediawiktory'
4
5
  require 'addressable/uri'
5
6
 
6
7
  require_relative 'media_wiki/traits'
@@ -36,7 +37,7 @@ module Infoboxer
36
37
  attr_accessor :user_agent
37
38
  end
38
39
 
39
- attr_reader :api_base_url
40
+ attr_reader :api_base_url, :traits
40
41
 
41
42
  # Creating new MediaWiki client. {Infoboxer.wiki} provides shortcut
42
43
  # for it, as well as shortcuts for some well-known wikis, like
@@ -49,7 +50,8 @@ module Infoboxer
49
50
  # * `:user_agent` (also aliased as `:ua`) -- custom User-Agent header.
50
51
  def initialize(api_base_url, options = {})
51
52
  @api_base_url = Addressable::URI.parse(api_base_url)
52
- @resource = RestClient::Resource.new(api_base_url, headers: headers(options))
53
+ @client = MediaWiktory::Client.new(api_base_url, user_agent: user_agent(options))
54
+ @traits = Traits.get(@api_base_url.host, namespaces: extract_namespaces)
53
55
  end
54
56
 
55
57
  # Receive "raw" data from Wikipedia (without parsing or wrapping in
@@ -57,18 +59,22 @@ module Infoboxer
57
59
  #
58
60
  # @return [Array<Hash>]
59
61
  def raw(*titles)
60
- postprocess @resource.get(
61
- params: DEFAULT_PARAMS.merge(titles: titles.join('|'))
62
- )
62
+ titles.each_slice(50).map{|part|
63
+ @client.query.
64
+ titles(*part).
65
+ prop(revisions: {prop: :content}, info: {prop: :url}).
66
+ redirects(true). # FIXME: should be done transparently by MediaWiktory?
67
+ perform.pages
68
+ }.inject(:concat) # somehow flatten(1) fails!
63
69
  end
64
70
 
65
- # Receive list of parsed wikipedia pages for list of titles provided.
71
+ # Receive list of parsed MediaWiki pages for list of titles provided.
66
72
  # All pages are received with single query to MediaWiki API.
67
73
  #
68
- # **NB**: currently, if you are requesting more than 50 titles at
69
- # once (MediaWiki limitation for single request), Infoboxer will
70
- # **not** try to get other pages with subsequent queries. This will
71
- # be fixed in future.
74
+ # **NB**: if you are requesting more than 50 titles at once
75
+ # (MediaWiki limitation for single request), Infoboxer will do as
76
+ # many queries as necessary to extract them all (it will be like
77
+ # `(titles.count / 50.0).ceil` requests)
72
78
  #
73
79
  # @return [Tree::Nodes<Page>] array of parsed pages. Notes:
74
80
  # * if you call `get` with only one title, one page will be
@@ -87,76 +93,118 @@ module Infoboxer
87
93
  # NotFound.
88
94
  #
89
95
  def get(*titles)
90
- pages = raw(*titles).reject{|raw| raw[:content].nil?}.
96
+ pages = raw(*titles).
97
+ tap{|pages| pages.detect(&:invalid?).tap{|i| i && fail(i.raw.invalidreason)}}.
98
+ select(&:exists?).
91
99
  map{|raw|
92
- traits = Traits.get(@api_base_url.host, extract_traits(raw))
93
-
94
100
  Page.new(self,
95
- Parser.paragraphs(raw[:content], traits),
96
- raw.merge(traits: traits))
101
+ Parser.paragraphs(raw.content, traits),
102
+ raw)
97
103
  }
98
104
  titles.count == 1 ? pages.first : Tree::Nodes[*pages]
99
105
  end
100
106
 
101
- private
107
+ # Receive list of parsed MediaWiki pages from specified category.
108
+ #
109
+ # **NB**: currently, this API **always** fetches all pages from
110
+ # category, there is no option to "take first 20 pages". Pages are
111
+ # fetched in 50-page batches, then parsed. So, for large category
112
+ # it can really take a while to fetch all pages.
113
+ #
114
+ # @param title Category title. You can use namespaceless title (like
115
+ # `"Countries in South America"`), title with namespace (like
116
+ # `"Category:Countries in South America"`) or title with local
117
+ # namespace (like `"Catégorie:Argentine"` for French Wikipedia)
118
+ #
119
+ # @return [Tree::Nodes<Page>] array of parsed pages.
120
+ #
121
+ def category(title)
122
+ title = normalize_category_title(title)
123
+
124
+ list(categorymembers: {title: title, limit: 50})
125
+ end
102
126
 
103
- # @private
104
- PROP = [
105
- 'revisions', # to extract content of the page
106
- 'info', # to extract page canonical url
107
- 'categories', # to extract default category prefix
108
- 'images' # to extract default media prefix
109
- ].join('|')
110
-
111
- # @private
112
- DEFAULT_PARAMS = {
113
- action: :query,
114
- format: :json,
115
- redirects: true,
116
-
117
- prop: PROP,
118
- rvprop: :content,
119
- inprop: :url,
120
- }
121
-
122
- def headers(options)
123
- {'User-Agent' => options[:user_agent] || options[:ua] || self.class.user_agent || UA}
127
+ # Receive list of parsed MediaWiki pages for provided search query.
128
+ # See [MediaWiki API docs](https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bsearch)
129
+ # for details.
130
+ #
131
+ # **NB**: currently, this API **always** fetches all pages from
132
+ # category, there is no option to "take first 20 pages". Pages are
133
+ # fetched in 50-page batches, then parsed. So, for large category
134
+ # it can really take a while to fetch all pages.
135
+ #
136
+ # @param query Search query. For old installations, look at
137
+ # https://www.mediawiki.org/wiki/Help:Searching
138
+ # for search syntax. For new ones (including Wikipedia), see at
139
+ # https://www.mediawiki.org/wiki/Help:CirrusSearch.
140
+ #
141
+ # @return [Tree::Nodes<Page>] array of parsed pages.
142
+ #
143
+ def search(query)
144
+ list(search: {search: query, limit: 50})
145
+ end
146
+
147
+ # Receive list of parsed MediaWiki pages with titles startin from prefix.
148
+ # See [MediaWiki API docs](https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bprefixsearch)
149
+ # for details.
150
+ #
151
+ # **NB**: currently, this API **always** fetches all pages from
152
+ # category, there is no option to "take first 20 pages". Pages are
153
+ # fetched in 50-page batches, then parsed. So, for large category
154
+ # it can really take a while to fetch all pages.
155
+ #
156
+ # @param prefix page title prefix.
157
+ #
158
+ # @return [Tree::Nodes<Page>] array of parsed pages.
159
+ #
160
+ def prefixsearch(prefix)
161
+ list(prefixsearch: {search: prefix, limit: 100})
124
162
  end
125
163
 
126
- def extract_traits(raw)
127
- raw.select{|k, v| [:file_prefix, :category_prefix].include?(k)}
164
+ def inspect
165
+ "#<#{self.class}(#{@api_base_url.host})>"
128
166
  end
129
167
 
130
- def guess_traits(pages)
131
- categories = pages.map{|p| p['categories']}.compact.flatten
132
- images = pages.map{|p| p['images']}.compact.flatten
133
- {
134
- file_prefix: images.map{|i| i['title'].scan(/^([^:]+):/)}.flatten.uniq,
135
- category_prefix: categories.map{|i| i['title'].scan(/^([^:]+):/)}.flatten.uniq,
136
- }
168
+ private
169
+
170
+ def list(query)
171
+ response = @client.query.
172
+ generator(query).
173
+ prop(revisions: {prop: :content}, info: {prop: :url}).
174
+ redirects(true). # FIXME: should be done transparently by MediaWiktory?
175
+ perform
176
+
177
+ response.continue! while response.continue?
178
+
179
+ pages = response.pages.select(&:exists?).
180
+ map{|raw|
181
+ Page.new(self,
182
+ Parser.paragraphs(raw.content, traits),
183
+ raw)
184
+ }
185
+
186
+ Tree::Nodes[*pages]
137
187
  end
138
188
 
139
- def postprocess(response)
140
- pages = JSON.parse(response)['query']['pages']
141
- traits = guess_traits(pages.values)
189
+ def normalize_category_title(title)
190
+ # FIXME: shouldn't it go to MediaWiktory?..
191
+ namespace, titl = title.include?(':') ? title.split(':', 2) : [nil, title]
192
+ namespace, titl = nil, title unless traits.category_namespace.include?(namespace)
142
193
 
143
- pages.map{|id, data|
144
- if id.to_i < 0
145
- {
146
- title: data['title'],
147
- content: nil,
148
- not_found: true
149
- }
150
- else
151
- {
152
- title: data['title'],
153
- content: data['revisions'].first['*'],
154
- url: data['fullurl'],
155
- }.merge(traits)
156
- end
194
+ namespace ||= traits.category_namespace.first
195
+ [namespace, titl].join(':')
196
+ end
197
+
198
+ def user_agent(options)
199
+ options[:user_agent] || options[:ua] || self.class.user_agent || UA
200
+ end
201
+
202
+ def extract_namespaces
203
+ siteinfo = @client.query.meta(siteinfo: {prop: [:namespaces, :namespacealiases]}).perform
204
+ siteinfo.raw.query.namespaces.map{|_, namespace|
205
+ aliases = siteinfo.raw.query.namespacealiases.select{|a| a.id == namespace.id}.map{|a| a['*']}
206
+ namespace.merge(aliases: aliases)
157
207
  }
158
- rescue JSON::ParserError
159
- fail RuntimeError, "Not a JSON response, seems there's not a MediaWiki API: #{@api_base_url}"
160
208
  end
161
209
  end
162
210
  end
@@ -118,7 +118,7 @@ module Infoboxer
118
118
  #
119
119
  # @return {Tree::Nodes}
120
120
  def categories
121
- lookup(Tree::Wikilink, namespace: /^#{ensure_traits.category_prefix.join('|')}$/)
121
+ lookup(Tree::Wikilink, namespace: /^#{ensure_traits.category_namespace.join('|')}$/)
122
122
  end
123
123
 
124
124
  # As users accustomed to have only one infobox on a page
@@ -1,4 +1,6 @@
1
1
  # encoding: utf-8
2
+ require 'strscan'
3
+
2
4
  module Infoboxer
3
5
  class Parser
4
6
  class Context
@@ -86,11 +88,23 @@ module Infoboxer
86
88
  res
87
89
  end
88
90
 
91
+ def push_eol_sign(re)
92
+ @inline_eol_sign = re
93
+ end
94
+
95
+ def pop_eol_sign
96
+ @inline_eol_sign = nil
97
+ end
98
+
99
+ attr_reader :inline_eol_sign
100
+
89
101
  def inline_eol?(exclude = nil)
90
102
  # not using StringScanner#check, as it will change #matched value
91
103
  eol? ||
92
- (current =~ %r[^(</ref>|}})] &&
93
- (!exclude || $1 !~ exclude)) # FIXME: ugly, but no idea of prettier solution
104
+ (
105
+ (current =~ %r[^(</ref>|}})] || @inline_eol_sign && current =~ @inline_eol_sign) &&
106
+ (!exclude || $1 !~ exclude)
107
+ ) # FIXME: ugly, but no idea of prettier solution
94
108
  end
95
109
 
96
110
  def scan_continued_until(re, leave_pattern = false)
@@ -5,7 +5,7 @@ module Infoboxer
5
5
  include Tree
6
6
 
7
7
  def image
8
- @context.skip(re.file_prefix) or
8
+ @context.skip(re.file_namespace) or
9
9
  @context.fail!("Something went wrong: it's not image?")
10
10
 
11
11
  path = @context.scan_until(/\||\]\]/)
@@ -32,7 +32,12 @@ module Infoboxer
32
32
  def short_inline(until_pattern = nil)
33
33
  nodes = Nodes[]
34
34
  guarded_loop do
35
- chunk = @context.scan_until(re.short_inline_until_cache[until_pattern])
35
+ # FIXME: quick and UGLY IS HELL JUST TRYING TO MAKE THE SHIT WORK
36
+ if @context.inline_eol_sign
37
+ chunk = @context.scan_until(re.short_inline_until_cache_brackets[until_pattern])
38
+ else
39
+ chunk = @context.scan_until(re.short_inline_until_cache[until_pattern])
40
+ end
36
41
  nodes << chunk
37
42
 
38
43
  break if @context.matched_inline?(until_pattern)
@@ -82,7 +87,7 @@ module Infoboxer
82
87
  when "''"
83
88
  Italic.new(short_inline(/''/))
84
89
  when '[['
85
- if @context.check(re.file_prefix)
90
+ if @context.check(re.file_namespace)
86
91
  image
87
92
  else
88
93
  wikilink
@@ -118,7 +123,11 @@ module Infoboxer
118
123
  # [http://www.example.org link name]
119
124
  def external_link(protocol)
120
125
  link = @context.scan_continued_until(/\s+|\]/)
121
- caption = inline(/\]/) if @context.matched =~ /\s+/
126
+ if @context.matched =~ /\s+/
127
+ @context.push_eol_sign(/^\]/)
128
+ caption = short_inline(/\]/)
129
+ @context.pop_eol_sign
130
+ end
122
131
  ExternalLink.new(protocol + link, caption)
123
132
  end
124
133
 
@@ -4,8 +4,8 @@ module Infoboxer
4
4
  module Template
5
5
  include Tree
6
6
 
7
- # NB: here we are not distingish templates like {{Infobox|variable}}
8
- # and "magic words" like {{formatnum:123}}
7
+ # NB: here we are not distingish templates like `{{Infobox|variable}}`
8
+ # and "magic words" like `{{formatnum:123}}`
9
9
  # Just calling all of them "templates". This behaviour will change
10
10
  # in future, I presume
11
11
  # More about magic words: https://www.mediawiki.org/wiki/Help:Magic_words
@@ -29,6 +29,7 @@ module Infoboxer
29
29
  @context.skip(/\s*=\s*/)
30
30
  else
31
31
  name = num
32
+ num += 1
32
33
  end
33
34
 
34
35
  value = long_inline(/\||}}/)
@@ -38,8 +39,6 @@ module Infoboxer
38
39
 
39
40
  break if @context.eat_matched?('}}')
40
41
  @context.eof? and @context.fail!("Unexpected break of template variables: #{res}")
41
-
42
- num += 1
43
42
  end
44
43
  res
45
44
  end
@@ -16,20 +16,31 @@ module Infoboxer
16
16
 
17
17
  INLINE_EOL = %r[(?= # if we have ahead... (not scanned, just checked
18
18
  </ref> | # <ref> closed
19
- }} # or template closed
19
+ }}
20
+ )]x
21
+
22
+ INLINE_EOL_BR = %r[(?= # if we have ahead... (not scanned, just checked
23
+ </ref> | # <ref> closed
24
+ }} | # or template closed
25
+ (?<!\])\](?!\]) # or ext.link closed,
26
+ # the madness with look-ahead/behind means "match single bracket but not double"
20
27
  )]x
21
28
 
22
29
 
23
30
  def make_regexps
24
31
  {
25
- file_prefix: /(#{@context.traits.file_prefix.join('|')}):/,
32
+ file_namespace: /(#{@context.traits.file_namespace.join('|')}):/,
26
33
  formatting: FORMATTING,
27
34
  inline_until_cache: Hash.new{|h, r|
28
35
  h[r] = Regexp.union(*[r, FORMATTING, /$/].compact.uniq)
29
36
  },
30
37
  short_inline_until_cache: Hash.new{|h, r|
31
38
  h[r] = Regexp.union(*[r, INLINE_EOL, FORMATTING, /$/].compact.uniq)
39
+ },
40
+ short_inline_until_cache_brackets: Hash.new{|h, r|
41
+ h[r] = Regexp.union(*[r, INLINE_EOL_BR, FORMATTING, /$/].compact.uniq)
32
42
  }
43
+
33
44
  }
34
45
  end
35
46
 
@@ -46,7 +57,7 @@ module Infoboxer
46
57
  scan.skip(/=\s*/)
47
58
  q = scan.scan(/['"]/)
48
59
  if q
49
- value = scan.scan_until(/#{q}/).sub(q, '')
60
+ value = scan.scan_until(/#{q}|$/).sub(q, '')
50
61
  else
51
62
  value = scan.scan_until(/\s|$/)
52
63
  end
@@ -43,7 +43,7 @@ module Infoboxer
43
43
  super(level) +
44
44
  if caption && !caption.empty?
45
45
  indent(level+1) + "caption:\n" +
46
- caption.map(&call(to_tree: level+2)).join
46
+ caption.children.map(&call(to_tree: level+2)).join
47
47
  else
48
48
  ''
49
49
  end