RubyGems - infoboxer - Versions diffs - 0.1.2.1 → 0.2.0 - Mend

infoboxer 0.1.2.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +13 -1
data/README.md +26 -5
data/bin/infoboxer +45 -0
data/infoboxer.gemspec +1 -1
data/lib/infoboxer/definitions/en.wikipedia.org.rb +0 -1
data/lib/infoboxer/media_wiki/page.rb +13 -5
data/lib/infoboxer/media_wiki/traits.rb +11 -5
data/lib/infoboxer/media_wiki.rb +115 -67
data/lib/infoboxer/navigation/shortcuts.rb +1 -1
data/lib/infoboxer/parser/context.rb +16 -2
data/lib/infoboxer/parser/image.rb +1 -1
data/lib/infoboxer/parser/inline.rb +12 -3
data/lib/infoboxer/parser/template.rb +3 -4
data/lib/infoboxer/parser/util.rb +14 -3
data/lib/infoboxer/tree/image.rb +1 -1
data/lib/infoboxer/tree/nodes.rb +2 -2
data/lib/infoboxer/tree/paragraphs.rb +1 -0
data/lib/infoboxer/tree/table.rb +1 -1
data/lib/infoboxer/tree/template.rb +9 -0
data/lib/infoboxer/version.rb +4 -1
data/lib/infoboxer.rb +87 -35
data/regression/pages/list_of_countries.wiki +1493 -0
data/regression/pages/ukrainian_galician_army.wiki +76 -0
metadata +8 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 5a8db6382e2ffef87c2685aa1a6ef9ad37f3b57b
-  data.tar.gz: c4a622a22d275e1098f6957fad0ec0de13a88001
+  metadata.gz: d3081274989109208504796d1357e7ab78dd8981
+  data.tar.gz: 255f2ffa01c283fd11cbe1a1b308223d276c3b22
 SHA512:
-  metadata.gz: 15945396bc46ea107235d2e1f2c9b496a81b6cc28fe6d6ecfc2c9424ded26cb8cf4a3ec06cc8c63a52b4e0839ee5e7deac941f59784dcb2c1f3769dea468d3d0
-  data.tar.gz: 0dbce96e10c402c1676a4f5b694862e9fec9cff7d25606b5217ce3e31e64776bfa36ac31ee0c8e7964b32010ad93e9ad4453e605cbd4abcd9eff02350b58f35e
+  metadata.gz: 47ff1c7ac1f6e34ba4e5491cd7f5a6e180f18c02c4bf6061d08c6589ca3b66cd8ac1c600cc6e03dda244c3dd37a1986356e47d86f79602416e0eba021182fe00
+  data.tar.gz: c40d2bb3f4b2d336830d56e8b8cc2b126807022f409fe41fd63f0d229f139030b4ef9c18116f802016c3e33af6f4aba1f1766c03619d3e185cdce2b949d63bf6

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,17 @@
 # Infoboxer's change log
+## 0.2.0 (2015-12-21)
+* MediaWiki backend changed to (our own handcrafted)
+  [mediawiktory](https://github.com/molybdenum-99/mediawiktory);
+* Added page lists fetching like `MediaWiki#category(categoryname)`,
+  `MediaWiki#search(search_phrase)`;
+* `MediaWiki#get` now can fetch any number of pages at once (it was only
+  50 in previous versions);
+* `bin/infoboxer` console added for quick experimenting;
+* `Template#to_h` added for quick information extraction;
+* many small bugfixes and echancements.
 ## 0.1.2.1 (2015-12-04)
 * Small bug with newlines in templates fixed.
@@ -22,6 +34,6 @@ Basically, preparing for wider release!
 ## 0.1.0 (2015-08-07)
-Initial (ok, I know it's typically called 0.1.1, but here's work of
+Initial (ok, I know it's typically called 0.0.1, but here's work of
 three monthes, numerous documentations and examples and so on... so, let
 it be 0.1.0).

data/README.md CHANGED Viewed

@@ -4,6 +4,7 @@
 [![Build Status](https://travis-ci.org/molybdenum-99/infoboxer.svg?branch=master)](https://travis-ci.org/molybdenum-99/infoboxer)
 [![Coverage Status](https://coveralls.io/repos/molybdenum-99/infoboxer/badge.svg?branch=master&service=github)](https://coveralls.io/github/molybdenum-99/infoboxer?branch=master)
 [![Code Climate](https://codeclimate.com/github/molybdenum-99/infoboxer/badges/gpa.svg)](https://codeclimate.com/github/molybdenum-99/infoboxer)
+[![Molybdenum-99 Gitter](https://badges.gitter.im/molybdenum-99.png)](https://gitter.im/molybdenum-99)
 **Infoboxer** is pure-Ruby Wikipedia (and generic MediaWiki) client and
 parser, targeting information extraction (hence the name).
@@ -97,6 +98,25 @@ See [Navigation shortcuts](https://github.com/molybdenum-99/infoboxer/wiki/Navig
 To put it all in one piece, also take a look at [Data extraction tips and tricks](https://github.com/molybdenum-99/infoboxer/wiki/Tips-and-tricks).
+### infoboxer executable
+Just try `infoboxer` command.
+Without any options, it starts IRB session with infoboxer required and
+included into main namespace.
+With `-w` option, it provides a shortcut to MediaWiki instance you want.
+Like this:
+```
+$ infoboxer -w https://en.wikipedia.org/w/api.php
+> get('Argentina')
+ => #<Page(title: "Argentina", url: "https://en.wikipedia.org/wiki/Argentina"): ....
+```
+You can also use shortcuts like `infoboxer -w wikipedia` for common
+wikies (and, just for fun, `infoboxer -wikipedia` also).
 ## Advanced topics
 * [Reasons](https://github.com/molybdenum-99/infoboxer/wiki/Reasons) for
@@ -114,9 +134,10 @@ To put it all in one piece, also take a look at [Data extraction tips and tricks
 ## Compatibility
-As of now, Infoboxer reported to be compatible with any MRI Ruby since 1.9.3.
-In Travis-CI tests, JRuby is failing due to bug in old Java 7/Java 8 SSL
-certificate support ([see here](https://github.com/jruby/jruby/issues/2599)),
+As of now, Infoboxer reported to be compatible with any MRI Ruby since 2.0.0
+(1.9.3 previously, dropped since Infoboxer 0.2.0). In Travis-CI tests,
+JRuby is failing due to bug in old Java 7/Java 8 SSL certificate support
+([see here](https://github.com/jruby/jruby/issues/2599)),
 and Rubinius failing 3 specs of 500 by mystery, which is uninvestigated yet.
 Therefore, those Ruby versions are excluded from Travis config, though,
@@ -129,10 +150,10 @@ they may still work for you.
   * **NB**: ↑ this is "current version" link, but RubyDoc.info unfortunately
     sometimes fails to update it to really _current_; in case you feel
     something seriously underdocumented, please-please look at
-    [0.1.2 docs](http://www.rubydoc.info/gems/infoboxer/0.1.2).
+    [0.2.0 docs](http://www.rubydoc.info/gems/infoboxer/0.2.0).
 * [Contributing](https://github.com/molybdenum-99/infoboxer/wiki/Contributing)
 * [Roadmap](https://github.com/molybdenum-99/infoboxer/wiki/Roadmap)
 ## License
-MIT.
+[MIT](https://github.com/molybdenum-99/infoboxer/blob/master/LICENSE.txt).

data/bin/infoboxer ADDED Viewed

@@ -0,0 +1,45 @@
+#!/usr/bin/env ruby
+require 'rubygems'
+require 'bundler/setup'
+require 'infoboxer'
+include Infoboxer
+require 'optparse'
+wiki_url = nil
+OptionParser.new do |opts|
+  opts.banner = "Usage: bin/infoboxer [-w wiki_api_url]"
+  opts.on("-w", "--wiki WIKI_API_URL",
+              "Make wiki by WIKI_API_URL a default wiki, and use it with just get('Pagename')") do |w|
+    wiki_url = w
+  end
+end.parse!
+if wiki_url
+  if wiki_url =~ /^[a-z]+$/
+    wiki_url = case
+    when domain = Infoboxer::WIKIMEDIA_PROJECTS[wiki_url.to_sym]
+      "https://en.#{domain}/w/api.php"
+    when domain = Infoboxer::WIKIMEDIA_PROJECTS[('w' + wiki_url).to_sym]
+      "https://en.#{domain}/w/api.php"
+    else
+      fail("Unidentified wiki: #{wiki_url}")
+    end
+  end
+  DEFAULT_WIKI = Infoboxer.wiki(wiki_url)
+  puts "Default Wiki selected: #{wiki_url}.\nNow you can use `get('Pagename')`, `category('Categoryname')` and so on.\n\n"
+  [:raw, :get, :category, :search, :prefixsearch].each do |m|
+    define_method(m){|*arg|
+      DEFAULT_WIKI.send(m, *arg)
+    }
+  end
+end
+require 'irb'
+ARGV.shift until ARGV.empty?
+IRB.start

data/infoboxer.gemspec CHANGED Viewed

@@ -29,7 +29,7 @@ Gem::Specification.new do |s|
   s.add_dependency 'htmlentities'
   s.add_dependency 'procme'
-  s.add_dependency 'rest-client'
+  s.add_dependency 'mediawiktory', '>= 0.0.2'
   s.add_dependency 'addressable'
   s.add_dependency 'terminal-table'
   s.add_dependency 'backports'

data/lib/infoboxer/definitions/en.wikipedia.org.rb CHANGED Viewed

@@ -24,7 +24,6 @@ module Infoboxer
         '!((' => '[[',
         '!-' => '|-',
         '!:' => ':',
-        '&' => '&',
         "'" => " '",
         "''" => '″',
         "'s" => "'‍s",

data/lib/infoboxer/media_wiki/page.rb CHANGED Viewed

@@ -7,15 +7,19 @@ module Infoboxer
     # Alongside with document tree structure, knows document's title as
     # represented by MediaWiki and human (non-API) URL.
     class Page < Tree::Document
-      def initialize(client, children, raw)
-        @client = client
-        super(children, raw)
+      def initialize(client, children, source)
+        @client, @source = client, source
+        super(children, title: source.title, url: source.fullurl)
       end
       # Instance of {MediaWiki} which this page was received from
       # @return {MediaWiki}
       attr_reader :client
+      # Instance of MediaWiktory::Page class with source data
+      # @return {MediaWiktory::Page}
+      attr_reader :source
       # @!attribute [r] title
       #   Page title.
       #   @return [String]
@@ -24,11 +28,15 @@ module Infoboxer
       #   Page friendly URL.
       #   @return [String]
-      def_readers :title, :url, :traits
+      def_readers :title, :url
+      def traits
+        client.traits
+      end
       private
-      PARAMS_TO_INSPECT = [:url, :title, :domain]
+      PARAMS_TO_INSPECT = [:url, :title] #, :domain]
       def show_params
         super(params.select{|k, v| PARAMS_TO_INSPECT.include?(k)})

data/lib/infoboxer/media_wiki/traits.rb CHANGED Viewed

@@ -68,14 +68,14 @@ module Infoboxer
       def initialize(options = {})
         @options = options
-        @file_prefix = [DEFAULTS[:file_prefix], options.delete(:file_prefix)].
+        @file_namespace = [DEFAULTS[:file_namespace], namespace_aliases(options, 'File')].
           flatten.compact.uniq
-        @category_prefix = [DEFAULTS[:category_prefix], options.delete(:category_prefix)].
+        @category_namespace = [DEFAULTS[:category_namespace], namespace_aliases(options, 'Category')].
           flatten.compact.uniq
       end
       # @private
-      attr_reader :file_prefix, :category_prefix
+      attr_reader :file_namespace, :category_namespace
       # @private
       def templates
@@ -84,9 +84,15 @@ module Infoboxer
       private
+      def namespace_aliases(options, canonical)
+        namespace = (options[:namespaces] || []).detect{|v| v.canonical == canonical}
+        return nil unless namespace
+        [namespace['*'], *namespace.aliases]
+      end
       DEFAULTS = {
-        file_prefix: 'File',
-        category_prefix: 'Category'
+        file_namespace: 'File',
+        category_namespace: 'Category'
       }
     end

data/lib/infoboxer/media_wiki.rb CHANGED Viewed

@@ -1,6 +1,7 @@
 # encoding: utf-8
-require 'rest-client'
-require 'json'
+#require 'rest-client'
+#require 'json'
+require 'mediawiktory'
 require 'addressable/uri'
 require_relative 'media_wiki/traits'
@@ -36,7 +37,7 @@ module Infoboxer
       attr_accessor :user_agent
     end
-    attr_reader :api_base_url
+    attr_reader :api_base_url, :traits
     # Creating new MediaWiki client. {Infoboxer.wiki} provides shortcut
     # for it, as well as shortcuts for some well-known wikis, like
@@ -49,7 +50,8 @@ module Infoboxer
     #   * `:user_agent` (also aliased as `:ua`) -- custom User-Agent header.
     def initialize(api_base_url, options = {})
       @api_base_url = Addressable::URI.parse(api_base_url)
-      @resource = RestClient::Resource.new(api_base_url, headers: headers(options))
+      @client = MediaWiktory::Client.new(api_base_url, user_agent: user_agent(options))
+      @traits = Traits.get(@api_base_url.host, namespaces: extract_namespaces)
     end
     # Receive "raw" data from Wikipedia (without parsing or wrapping in
@@ -57,18 +59,22 @@ module Infoboxer
     #
     # @return [Array<Hash>]
     def raw(*titles)
-      postprocess @resource.get(
-        params: DEFAULT_PARAMS.merge(titles: titles.join('|'))
-      )
+      titles.each_slice(50).map{|part|
+        @client.query.
+          titles(*part).
+          prop(revisions: {prop: :content}, info: {prop: :url}).
+          redirects(true). # FIXME: should be done transparently by MediaWiktory?
+          perform.pages
+      }.inject(:concat) # somehow flatten(1) fails!
     end
-    # Receive list of parsed wikipedia pages for list of titles provided.
+    # Receive list of parsed MediaWiki pages for list of titles provided.
     # All pages are received with single query to MediaWiki API.
     #
-    # **NB**: currently, if you are requesting more than 50 titles at
-    # once (MediaWiki limitation for single request), Infoboxer will
-    # **not** try to get other pages with subsequent queries. This will
-    # be fixed in future.
+    # **NB**: if you are requesting more than 50 titles at once
+    # (MediaWiki limitation for single request), Infoboxer will do as
+    # many queries as necessary to extract them all (it will be like
+    # `(titles.count / 50.0).ceil` requests)
     #
     # @return [Tree::Nodes<Page>] array of parsed pages. Notes:
     #   * if you call `get` with only one title, one page will be
@@ -87,76 +93,118 @@ module Infoboxer
     #     NotFound.
     #
     def get(*titles)
-      pages = raw(*titles).reject{|raw| raw[:content].nil?}.
+      pages = raw(*titles).
+        tap{|pages| pages.detect(&:invalid?).tap{|i| i && fail(i.raw.invalidreason)}}.
+        select(&:exists?).
         map{|raw|
-          traits = Traits.get(@api_base_url.host, extract_traits(raw))
           Page.new(self,
-            Parser.paragraphs(raw[:content], traits),
-            raw.merge(traits: traits))
+            Parser.paragraphs(raw.content, traits),
+            raw)
         }
       titles.count == 1 ? pages.first : Tree::Nodes[*pages]
     end
-    private
+    # Receive list of parsed MediaWiki pages from specified category.
+    #
+    # **NB**: currently, this API **always** fetches all pages from
+    # category, there is no option to "take first 20 pages". Pages are
+    # fetched in 50-page batches, then parsed. So, for large category
+    # it can really take a while to fetch all pages.
+    #
+    # @param title Category title. You can use namespaceless title (like
+    #     `"Countries in South America"`), title with namespace (like
+    #     `"Category:Countries in South America"`) or title with local
+    #     namespace (like `"Catégorie:Argentine"` for French Wikipedia)
+    #
+    # @return [Tree::Nodes<Page>] array of parsed pages.
+    #
+    def category(title)
+      title = normalize_category_title(title)
+      list(categorymembers: {title: title, limit: 50})
+    end
-    # @private
-    PROP = [
-      'revisions',    # to extract content of the page
-      'info',         # to extract page canonical url
-      'categories',   # to extract default category prefix
-      'images'        # to extract default media prefix
-    ].join('|')
-    # @private
-    DEFAULT_PARAMS = {
-      action:    :query,
-      format:    :json,
-      redirects: true,
-      prop:      PROP,
-      rvprop:    :content,
-      inprop:    :url,
-    }
-    def headers(options)
-      {'User-Agent' => options[:user_agent] || options[:ua] || self.class.user_agent || UA}
+    # Receive list of parsed MediaWiki pages for provided search query.
+    # See [MediaWiki API docs](https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bsearch)
+    # for details.
+    #
+    # **NB**: currently, this API **always** fetches all pages from
+    # category, there is no option to "take first 20 pages". Pages are
+    # fetched in 50-page batches, then parsed. So, for large category
+    # it can really take a while to fetch all pages.
+    #
+    # @param query Search query. For old installations, look at
+    #     https://www.mediawiki.org/wiki/Help:Searching
+    #     for search syntax. For new ones (including Wikipedia), see at
+    #     https://www.mediawiki.org/wiki/Help:CirrusSearch.
+    #
+    # @return [Tree::Nodes<Page>] array of parsed pages.
+    #
+    def search(query)
+      list(search: {search: query, limit: 50})
+    end
+    # Receive list of parsed MediaWiki pages with titles startin from prefix.
+    # See [MediaWiki API docs](https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bprefixsearch)
+    # for details.
+    #
+    # **NB**: currently, this API **always** fetches all pages from
+    # category, there is no option to "take first 20 pages". Pages are
+    # fetched in 50-page batches, then parsed. So, for large category
+    # it can really take a while to fetch all pages.
+    #
+    # @param prefix page title prefix.
+    #
+    # @return [Tree::Nodes<Page>] array of parsed pages.
+    #
+    def prefixsearch(prefix)
+      list(prefixsearch: {search: prefix, limit: 100})
     end
-    def extract_traits(raw)
-      raw.select{|k, v| [:file_prefix, :category_prefix].include?(k)}
+    def inspect
+      "#<#{self.class}(#{@api_base_url.host})>"
     end
-    def guess_traits(pages)
-      categories = pages.map{|p| p['categories']}.compact.flatten
-      images = pages.map{|p| p['images']}.compact.flatten
-      {
-        file_prefix: images.map{|i| i['title'].scan(/^([^:]+):/)}.flatten.uniq,
-        category_prefix: categories.map{|i| i['title'].scan(/^([^:]+):/)}.flatten.uniq,
-      }
+    private
+    def list(query)
+      response = @client.query.
+        generator(query).
+        prop(revisions: {prop: :content}, info: {prop: :url}).
+        redirects(true). # FIXME: should be done transparently by MediaWiktory?
+        perform
+      response.continue! while response.continue?
+      pages = response.pages.select(&:exists?).
+        map{|raw|
+          Page.new(self,
+            Parser.paragraphs(raw.content, traits),
+            raw)
+        }
+      Tree::Nodes[*pages]
     end
-    def postprocess(response)
-      pages = JSON.parse(response)['query']['pages']
-      traits = guess_traits(pages.values)
+    def normalize_category_title(title)
+      # FIXME: shouldn't it go to MediaWiktory?..
+      namespace, titl = title.include?(':') ? title.split(':', 2) : [nil, title]
+      namespace, titl = nil, title unless traits.category_namespace.include?(namespace)
-      pages.map{|id, data|
-        if id.to_i < 0
-          {
-            title: data['title'],
-            content: nil,
-            not_found: true
-          }
-        else
-          {
-            title: data['title'],
-            content: data['revisions'].first['*'],
-            url: data['fullurl'],
-          }.merge(traits)
-        end
+      namespace ||= traits.category_namespace.first
+      [namespace, titl].join(':')
+    end
+    def user_agent(options)
+      options[:user_agent] || options[:ua] || self.class.user_agent || UA
+    end
+    def extract_namespaces
+      siteinfo = @client.query.meta(siteinfo: {prop: [:namespaces, :namespacealiases]}).perform
+      siteinfo.raw.query.namespaces.map{|_, namespace|
+        aliases = siteinfo.raw.query.namespacealiases.select{|a| a.id == namespace.id}.map{|a| a['*']}
+        namespace.merge(aliases: aliases)
       }
-    rescue JSON::ParserError
-      fail RuntimeError, "Not a JSON response, seems there's not a MediaWiki API: #{@api_base_url}"
     end
   end
 end

data/lib/infoboxer/navigation/shortcuts.rb CHANGED Viewed

@@ -118,7 +118,7 @@ module Infoboxer
         #
         # @return {Tree::Nodes}
         def categories
-          lookup(Tree::Wikilink, namespace: /^#{ensure_traits.category_prefix.join('|')}$/)
+          lookup(Tree::Wikilink, namespace: /^#{ensure_traits.category_namespace.join('|')}$/)
         end
         # As users accustomed to have only one infobox on a page

data/lib/infoboxer/parser/context.rb CHANGED Viewed

@@ -1,4 +1,6 @@
 # encoding: utf-8
+require 'strscan'
 module Infoboxer
   class Parser
     class Context
@@ -86,11 +88,23 @@ module Infoboxer
         res
       end
+      def push_eol_sign(re)
+        @inline_eol_sign = re
+      end
+      def pop_eol_sign
+        @inline_eol_sign = nil
+      end
+      attr_reader :inline_eol_sign
       def inline_eol?(exclude = nil)
         # not using StringScanner#check, as it will change #matched value
         eol? ||
-          (current =~ %r[^(</ref>|}})] &&
-            (!exclude || $1 !~ exclude)) # FIXME: ugly, but no idea of prettier solution
+          (
+            (current =~ %r[^(</ref>|}})] || @inline_eol_sign && current =~ @inline_eol_sign) &&
+            (!exclude || $1 !~ exclude)
+          ) # FIXME: ugly, but no idea of prettier solution
       end
       def scan_continued_until(re, leave_pattern = false)

data/lib/infoboxer/parser/image.rb CHANGED Viewed

@@ -5,7 +5,7 @@ module Infoboxer
       include Tree
       def image
-        @context.skip(re.file_prefix) or
+        @context.skip(re.file_namespace) or
           @context.fail!("Something went wrong: it's not image?")
         path = @context.scan_until(/\||\]\]/)

data/lib/infoboxer/parser/inline.rb CHANGED Viewed

@@ -32,7 +32,12 @@ module Infoboxer
       def short_inline(until_pattern = nil)
         nodes = Nodes[]
         guarded_loop do
-          chunk = @context.scan_until(re.short_inline_until_cache[until_pattern])
+          # FIXME: quick and UGLY IS HELL JUST TRYING TO MAKE THE SHIT WORK
+          if @context.inline_eol_sign
+            chunk = @context.scan_until(re.short_inline_until_cache_brackets[until_pattern])
+          else
+            chunk = @context.scan_until(re.short_inline_until_cache[until_pattern])
+          end
           nodes << chunk
           break if @context.matched_inline?(until_pattern)
@@ -82,7 +87,7 @@ module Infoboxer
           when "''"
             Italic.new(short_inline(/''/))
           when '[['
-            if @context.check(re.file_prefix)
+            if @context.check(re.file_namespace)
               image
             else
               wikilink
@@ -118,7 +123,11 @@ module Infoboxer
         # [http://www.example.org link name]
         def external_link(protocol)
           link = @context.scan_continued_until(/\s+|\]/)
-          caption = inline(/\]/) if @context.matched =~ /\s+/
+          if @context.matched =~ /\s+/
+            @context.push_eol_sign(/^\]/)
+            caption = short_inline(/\]/)
+            @context.pop_eol_sign
+          end
           ExternalLink.new(protocol + link, caption)
         end

data/lib/infoboxer/parser/template.rb CHANGED Viewed

@@ -4,8 +4,8 @@ module Infoboxer
     module Template
       include Tree
-      # NB: here we are not distingish templates like {{Infobox|variable}}
-      # and "magic words" like {{formatnum:123}}
+      # NB: here we are not distingish templates like `{{Infobox|variable}}`
+      # and "magic words" like `{{formatnum:123}}`
       # Just calling all of them "templates". This behaviour will change
       # in future, I presume
       # More about magic words: https://www.mediawiki.org/wiki/Help:Magic_words
@@ -29,6 +29,7 @@ module Infoboxer
             @context.skip(/\s*=\s*/)
           else
             name = num
+            num += 1
           end
           value = long_inline(/\||}}/)
@@ -38,8 +39,6 @@ module Infoboxer
           break if @context.eat_matched?('}}')
           @context.eof? and @context.fail!("Unexpected break of template variables: #{res}")
-          num += 1
         end
         res
       end

data/lib/infoboxer/parser/util.rb CHANGED Viewed

@@ -16,20 +16,31 @@ module Infoboxer
       INLINE_EOL = %r[(?=   # if we have ahead... (not scanned, just checked
         </ref>        |     # <ref> closed
-        }}                  # or template closed
+        }}
+      )]x
+      INLINE_EOL_BR = %r[(?=   # if we have ahead... (not scanned, just checked
+        </ref>        |     # <ref> closed
+        }}            |     # or template closed
+        (?<!\])\](?!\])     # or ext.link closed,
+                            # the madness with look-ahead/behind means "match single bracket but not double"
       )]x
       def make_regexps
         {
-          file_prefix: /(#{@context.traits.file_prefix.join('|')}):/,
+          file_namespace: /(#{@context.traits.file_namespace.join('|')}):/,
           formatting: FORMATTING,
           inline_until_cache: Hash.new{|h, r|
             h[r] = Regexp.union(*[r, FORMATTING, /$/].compact.uniq)
           },
           short_inline_until_cache: Hash.new{|h, r|
             h[r] = Regexp.union(*[r, INLINE_EOL, FORMATTING, /$/].compact.uniq)
+          },
+          short_inline_until_cache_brackets: Hash.new{|h, r|
+            h[r] = Regexp.union(*[r, INLINE_EOL_BR, FORMATTING, /$/].compact.uniq)
           }
         }
       end
@@ -46,7 +57,7 @@ module Infoboxer
             scan.skip(/=\s*/)
             q = scan.scan(/['"]/)
             if q
-              value = scan.scan_until(/#{q}/).sub(q, '')
+              value = scan.scan_until(/#{q}|$/).sub(q, '')
             else
               value = scan.scan_until(/\s|$/)
             end

data/lib/infoboxer/tree/image.rb CHANGED Viewed

@@ -43,7 +43,7 @@ module Infoboxer
         super(level) +
           if caption && !caption.empty?
             indent(level+1) + "caption:\n" +
-              caption.map(&call(to_tree: level+2)).join
+              caption.children.map(&call(to_tree: level+2)).join
           else
             ''
           end