RubyGems - wp2txt - Versions diffs - 0.5.4 → 0.6.0 - Mend

wp2txt 0.5.4 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 07ab0b6030dcc43512ef8f17504519dc06f72a25
-  data.tar.gz: 915a4ea67ab2bfbe7b465d3d6d718c30a02b463f
+  metadata.gz: c05af7e5c72b073f18b53eca8619212f0928aaa1
+  data.tar.gz: 02dc116458041b096fd811b3fec6ffb6e6ff3ee7
 SHA512:
-  metadata.gz: f2d66c66a8e87dc8d35c7627729e44dc3a9d6c93624a3055e925ad293fbf21aff3e012dfd4e00d14b815e9b8ee7f303b704430555a7525bb497d8ae11f390968
-  data.tar.gz: 75f6245efe1b55dfcb0612b49d8b2fd7e96cf3fb2dbe729abbcc3761f6d7431f2e9408c8970bf310e96626819e884ce681bfb9685462df3461ca0f494e0d483d
+  metadata.gz: 26479a43e8e3ccebfb6578562d62b7b6d2234e924838774a11d143c8fad7e1952db0278526d202a77aad680926b96baa25c8da9e0c23f916d95ccf614d5e82a3
+  data.tar.gz: 8959c9b51efb18386cf556c67439332fc43723d8de8abfc014745d0d0ad1f89ba56b6222b6f8d2a33748343e2dbef93752ee104c8a54c57e01fe21bee7e2f892

data/.gitignore CHANGED Viewed

@@ -9,7 +9,7 @@ _yardoc
 coverage
 doc/
 lib/bundler/man
-pkg
+pkg/
 rdoc
 spec/reports
 test/tmp

data/README.md CHANGED Viewed

@@ -2,8 +2,6 @@
 Wikipedia dump file to text converter
-     CAUTION: This software is on an experimental stage. Use with care!
 ### About ###
 WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata. It is originally intended to be useful for researchers who look for an easy way to obtain open-source multi-lingual corpora, but may be handy for other purposes.
@@ -14,15 +12,19 @@ WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compres
 * Create output files of specified size.
 * Allow users to specify text elements to be extracted/converted (page titles, section titles, lists, and tables).
-WP2TXT before version 0.4.0 came with Mac/Windows GUI. Now it's become a pure command-line application--Sorry GUI folks, but there seems more demand for an easy-to-hack CUI package than a not-very-flexible GUI app.
 ### Installation
     $ gem install wp2txt
+It is highly recommended you also install bz2-ruby gem. See the following for the details about bz2-ruby gem:
+[https://github.com/brianmario/bzip2-ruby](https://github.com/brianmario/bzip2-ruby)
+When the above gem is not found, wp2txt will try to use bzip2 program in your command line environment.  Supposedly he former option is more reliable as well as fast.
 ### Usage
-Obtain a Wikipedia dump file (see the link below) with a file name such as:
+Obtain a Wikipedia dump file (from [here](http://dumps.wikimedia.org/backup-index.html)) with a file name such as:
     xxwiki-yyyymmdd-pages-articles.xml.bz2
@@ -30,24 +32,31 @@ where `xx` is language code such as "en (English)" or "ja (Japanese)", and  `yyy
 Command line options are as follows:
-    Usage: wp2txt [options]
-    where [options] are:
-                         --input-file, -i:   Wikipedia dump file with .bz2 (compressed) or .txt (uncompressed) format
-                     --output-dir, -o <s>:   Output directory (default: Present working directory)
-                        --convert-off, -c:   Output XML (without converting to plain text)
-                           --list-off, -l:   Exclude list items from output
-                        --heading-off, -d:   Exclude section titles from output
-                          --title-off, -t:   Exclude page titles from output
-          --table-off, --no-table-off, -a:   Exclude page titles from output (default: true)
-    --template-off, --no-template-off, -e:   Remove template notations from output (default: true)
-                       --redirect-off, -r:   Not show redirect destination
-                       --strip-marker, -s:   Remove symbols prefixed to list items, definitions, etc.
-                       --category-off, -g:   Not show article category information
-                      --file-size, -f <i>:   Approximate size (in MB) of each output file (default: 10)
-                            --version, -v:   Print version and exit
-                               --help, -h:   Show this message
-### Limitations ###
+*CAUTION:* command line options in the current version have been drastically changed from those in versions 0.5!
+      Usage: wp2txt [options]
+      where [options] are:
+               --input-file, -i:   Wikipedia dump file with .bz2 (compressed) or
+                                   .txt (uncompressed) format
+           --output-dir, -o <s>:   Output directory (default:
+                                   /Users/yohasebe/Dropbox/code/wp2txt)
+    --convert, --no-convert, -c:   Output in plain text (converting from XML)
+                                   (default: true)
+          --list, --no-list, -l:   Show list items in output (default: true)
+    --heading, --no-heading, -d:   Show section titles in output (default: true)
+        --title, --no-title, -t:   Show page titles in output (default: true)
+                    --table, -a:   Show table source code in output
+                 --template, -e:   Show template specifications in output
+                 --redirect, -r:   Show redirect destination
+      --marker, --no-marker, -m:   Show symbols prefixed to list items,
+                                   definitions, etc. (Default: true)
+                 --category, -g:   Show article category information
+            --file-size, -f <i>:   Approximate size (in MB) of each output file
+                                   (default: 10)
+                  --version, -v:   Print version and exit
+                     --help, -h:   Show this message
+### Caveats ###
 * Certain types of data such as mathematical equations and computer source code are not be properly converted.  Please remember this software is originally intended for correcting “sentences” for linguistic studies.
 * Extraction of normal text data could sometimes fail for various reasons (e.g. illegal matching of begin/end tags, language-specific conventions of formatting, etc).

data/bin/wp2txt CHANGED Viewed

@@ -26,15 +26,15 @@ EOS
   opt :input_file,  "Wikipedia dump file with .bz2 (compressed) or .txt (uncompressed) format", :required => true
   opt :output_dir,  "Output directory", :default => Dir::pwd, :type => String
-  opt :convert_off, "Output XML (without converting to plain text)", :default => false
-  opt :list_off,    "Exclude list items from output", :default => false
-  opt :heading_off, "Exclude section titles from output", :default => false, :short => "-d"
-  opt :title_off,   "Exclude page titles from output", :default => false
-  opt :table_off,   "Exclude page titles from output", :default => true
-  opt :template_off, "Remove template notations from output", :default => true
-  opt :redirect_off, "Not show redirect destination", :default => false
-  opt :strip_marker, "Remove symbols prefixed to list items, definitions, etc.", :default => false
-  opt :category_off, "Not show article category information", :default => false
+  opt :convert,     "Output in plain text (converting from XML)", :default => true
+  opt :list,    "Show list items in output", :default => true
+  opt :heading, "Show section titles in output", :default => true, :short => "-d"
+  opt :title,   "Show page titles in output", :default => true
+  opt :table,   "Show table source code in output", :default => false
+  opt :template, "Show template specifications in output", :default => false
+  opt :redirect, "Show redirect destination", :default => false
+  opt :marker, "Show symbols prefixed to list items, definitions, etc.", :default => true
+  opt :category, "Show article category information", :default => false
   opt :file_size,   "Approximate size (in MB) of each output file", :default => 10
 end
 Trollop::die :size, "must be larger than 0" unless opts[:file_size] >= 0
@@ -43,9 +43,9 @@ Trollop::die :output_dir, "must exist" unless File.exist?(opts[:output_dir])
 input_file = ARGV[0]
 output_dir = opts[:output_dir]
 tfile_size = opts[:file_size]
-convert_off = opts[:convert_off]
-strip_tmarker = opts[:strip_marker]
-opt_array = [:title_off, :list_off, :heading_off, :table_off, :template_off, :redirect_off]
+convert = opts[:convert]
+strip_tmarker = opts[:marker] ? false : true
+opt_array = [:title_off, :list, :heading, :table, :template, :redirect]
 config = {}
 opt_array.each do |opt|
   config[opt] = opts[opt]
@@ -54,13 +54,13 @@ end
 # a "parent" is either commandline progress bar or
 # a gui window (not available for now)
 parent = Wp2txt::CmdProgbar.new
-wpconv = Wp2txt::Runner.new(parent, input_file, output_dir, tfile_size, convert_off, strip_tmarker)
+wpconv = Wp2txt::Runner.new(parent, input_file, output_dir, tfile_size, convert, strip_tmarker)
 wpconv.extract_text do |article|
   title = format_wiki article.title
   title = "[[#{title}]]\n"
-  if !opts[:category_off] && !article.categories.empty?
+  if opts[:category] && !article.categories.empty?
     contents = "\nCATEGORIES: "
     contents += article.categories.join(", ")
     contents += "\n\n"
@@ -71,31 +71,31 @@ wpconv.extract_text do |article|
   article.elements.each do |e|
     case e.first
     when :mw_heading
-      next if config[:heading_off]
+      next if !config[:heading]
       line = format_wiki(e.last)
       line += "+HEADING+" if $DEBUG_MODE
     when :mw_paragraph
-      next if config[:paragraph_off]
+      # next if !config[:paragraph]
       line = format_wiki(e.last)
       line += "+PARAGRAPH+" if $DEBUG_MODE
     when :mw_table, :mw_htable
-      next if config[:table_off]
+      next if !config[:table]
       line = format_wiki(e.last)
       line += "+TABLE+" if $DEBUG_MODE
     when :mw_pre
-      next if config[:pre_off]
+      next if !config[:pre]
       line = e.last
       line += "+PRE+" if $DEBUG_MODE
     when :mw_quote
-      next if config[:quote_off]
+      # next if !config[:quote]
       line = format_wiki(e.last)
       line += "+QUOTE+" if $DEBUG_MODE
     when :mw_unordered, :mw_ordered, :mw_definition
-      next if config[:list_off]
+      next if !config[:list]
       line = format_wiki(e.last)
       line += "+LIST+" if $DEBUG_MODE
     when :mw_redirect
-      next if config[:redirect_off]
+      next if !config[:redirect]
       line = format_wiki(e.last)
       line += "+REDIRECT+" if $DEBUG_MODE
       line += "\n\n"
@@ -108,14 +108,14 @@ wpconv.extract_text do |article|
       end
     end
     contents += line
-    contents = remove_templates(contents) if config[:template_off]
+    contents = remove_templates(contents) unless config[:template]
   end
   ##### cleanup #####
   if /\A\s*\z/m =~ contents
     result = ""
   else
-    result = config[:title_off] ? contents : title + "\n" + contents
+    result = config[:title] ? title + "\n" + contents : contents
   end
   result = result.gsub(/\[ref\]\s*\[\/ref\]/m){""}
   result = result.gsub(/\n\n\n+/m){"\n\n"} + "\n"

data/lib/wp2txt/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Wp2txt
-  VERSION = "0.5.4"
+  VERSION = "0.6.0"
 end

data/lib/wp2txt.rb CHANGED Viewed

@@ -25,16 +25,16 @@ module Wp2txt
     include Wp2txt
-    # attr_accessor :pause_flag, :stop_flag, :outfiles, :convert_off
+    # attr_accessor :pause_flag, :stop_flag, :outfiles, :convert
-    def initialize(parent, input_file, output_dir = ".", tfile_size = 10, convert_off = false, strip_tmarker = false)
+    def initialize(parent, input_file, output_dir = ".", tfile_size = 10, convert = true, strip_tmarker = false)
       @parent = parent
       @fp = nil
       @input_file = input_file
       @output_dir = output_dir
       @tfile_size = tfile_size
-      @convert_off = convert_off
+      @convert = convert
       @strip_tmarker = strip_tmarker
     end
@@ -213,16 +213,15 @@ module Wp2txt
     # call this method to do the job
     def extract_text(&block)
       prepare
-      # output the original xml only split to files of the specified size
-      if @convert_off
-        extract
-        # convert xml to plain text
-      else
+      if @convert
         if block
           extract_and_convert(&block)
         else
           extract_and_convert
         end
+      else
+        # output the original xml only split to files of the specified size
+        extract
       end
     end

data/spec/utils_spec.rb CHANGED Viewed

@@ -23,7 +23,7 @@ describe "Wp2txt" do
       str_processed = process_nested_structure(scanner, "[[", "]]") do |content|
         "<<" + content + ">>"
       end
-      str_processed.should == str_after
+      expect(str_processed).to eq str_after
       str_before = "#* {{quote-book|1503|year_published=1836|chapter=19 Henry VII. c. 5: Coin||A Collection of Statutes Connected with the General Administration of the Law|page=158|url=http://books.google.com/books?id=QtYuAAAAIAAJ
       |passage={{...}} every of them, being gold, whole and weight, shall '''go''' and be current in payment throughout this his realm for the sum that they were coined for.}}"
@@ -33,7 +33,9 @@ describe "Wp2txt" do
       str_processed = process_nested_structure(scanner, "{{", "}}") do |content|
         "<<" + content + ">>"
       end
-      str_processed.should == str_after
+      #str_processed.should == str_after
+      expect(str_processed).to eq str_after
     end
   end
@@ -41,7 +43,7 @@ describe "Wp2txt" do
     it "replaces character references with real characters" do
       str_before = "&nbsp; &lt; &gt; &amp; &quot;"
       str_after  = "  < > & \""
-      special_chr(str_before).should == str_after
+      expect(special_chr(str_before)).to eq str_after
     end
   end
@@ -49,7 +51,7 @@ describe "Wp2txt" do
     it "replaces character references with real characters" do
       str_before = "&#x266A;"
       str_after  = "♪"
-      chrref_to_utf(str_before).should == str_after
+      expect(chrref_to_utf(str_before)).to eq str_after
     end
   end
@@ -57,7 +59,7 @@ describe "Wp2txt" do
     it "replaces {mdash}, {ndash}, or {–} with '–'" do
       str_before = "{mdash} {ndash} {–}"
       str_after  = "– – –"
-      mndash(str_before).should == str_after
+      expect(mndash(str_before)).to eq str_after
     end
   end
@@ -65,7 +67,7 @@ describe "Wp2txt" do
     it "replaces \\r\\n and <br /> inside [ref] ... [/ref] to ' '" do
       str_before = "[ref]...\r\n...<br />...[/ref]"
       str_after  = "... ... ..."
-      format_ref(str_before).should == str_after
+      expect(format_ref(str_before)).to eq str_after
     end
   end
@@ -73,7 +75,7 @@ describe "Wp2txt" do
     it "replaces <ref> tag with [ref]" do
       str_before = "<ref> ... <br /> ... </ref> \n <ref />"
       str_after  = "[ref] ... \n ... [/ref] \n "
-      make_reference(str_before).should == str_after
+      expect(make_reference(str_before)).to eq str_after
     end
   end
@@ -81,7 +83,7 @@ describe "Wp2txt" do
     it "removes table formated parts" do
       str_before = "{| ... \n{| ... \n ...|}\n ...|}"
       str_after  = ""
-      remove_table(str_before).should == str_after
+      expect(remove_table(str_before)).to eq str_after
     end
   end
@@ -89,7 +91,7 @@ describe "Wp2txt" do
     it "removes clade formated parts" do
       str_before = "\{\{clade ... \n ... \n ... \n\}\}"
       str_after  = ""
-      remove_clade(str_before).should == str_after
+      expect(remove_clade(str_before)).to eq str_after
     end
   end
@@ -97,7 +99,7 @@ describe "Wp2txt" do
     it "removes horizontal lines" do
       str_before = "\n----\n--\n--\n"
       str_after  = "\n\n"
-      remove_hr(str_before).should == str_after
+      expect(remove_hr(str_before)).to eq str_after
     end
   end
@@ -105,10 +107,10 @@ describe "Wp2txt" do
     it "removes tags" do
       str_before = "<tag>abc</tag>"
       str_after  = "abc"
-      remove_tag(str_before).should == str_after
+      expect(remove_tag(str_before)).to eq str_after
       str_before = "[tag]def[/tag]"
       str_after  = "def"
-      remove_tag(str_before, ['[', ']']).should == str_after
+      expect(remove_tag(str_before, ['[', ']'])).to eq str_after
     end
   end
@@ -116,7 +118,7 @@ describe "Wp2txt" do
     it "removes directive" do
       str_before = "__abc__\n __def__"
       str_after  = "\n "
-      remove_directive(str_before).should == str_after
+      expect(remove_directive(str_before)).to eq str_after
     end
   end
@@ -124,7 +126,7 @@ describe "Wp2txt" do
     it "removes directive" do
       str_before = "''abc''\n'''def'''"
       str_after  = "abc\ndef"
-      remove_emphasis(str_before).should == str_after
+      expect(remove_emphasis(str_before)).to eq str_after
     end
   end
@@ -132,7 +134,7 @@ describe "Wp2txt" do
     it "replaces <nowiki>...</nowiki> with <nowiki-object_id>" do
       str_before = "<nowiki>[[abc]]</nowiki>def<nowiki>[[ghi]]</nowiki>"
       str_after  = Regexp.new("<nowiki-\\d+>def<nowiki-\\d+>")
-      escape_nowiki(str_before).should =~ str_after
+      expect(escape_nowiki(str_before)).to match str_after
     end
   end
@@ -141,24 +143,24 @@ describe "Wp2txt" do
       @nowikis = {123 => "[[abc]]", 124 => "[[ghi]]"}
       str_before = "<nowiki-123>def<nowiki-124>"
       str_after  = "[[abc]]def[[ghi]]"
-      unescape_nowiki(str_before).should == str_after
+      expect(unescape_nowiki(str_before)).to eq str_after
     end
   end
   describe "process_interwiki_links" do
     it "formats text link and remove brackets" do
-      process_interwiki_links("[[a b]]").should   == "a b"
-      process_interwiki_links("[[a b|c]]").should == "c"
-      process_interwiki_links("[[a|b|c]]").should == "b|c"
-      process_interwiki_links("[[硬口蓋鼻音|[ɲ], /J/]]").should == "[ɲ], /J/"
+      expect(process_interwiki_links("[[a b]]")).to eq "a b"
+      expect(process_interwiki_links("[[a b|c]]")).to eq "c"
+      expect(process_interwiki_links("[[a|b|c]]")).to eq "b|c"
+      expect(process_interwiki_links("[[硬口蓋鼻音|[ɲ], /J/]]")).to eq "[ɲ], /J/"
     end
   end
   describe "process_external_links" do
     it "formats text link and remove brackets" do
-      process_external_links("[http://yohasebe.com yohasebe.com]").should   == "yohasebe.com"
-      process_external_links("[http://yohasebe.com]").should   == "http://yohasebe.com"
-      process_external_links("* Turkish: {{t+|tr|köken bilimi}}]], {{t+|tr|etimoloji}}").should == "* Turkish: {{t+|tr|köken bilimi}}]], {{t+|tr|etimoloji}}"
+      expect(process_external_links("[http://yohasebe.com yohasebe.com]")).to eq "yohasebe.com"
+      expect(process_external_links("[http://yohasebe.com]")).to eq "http://yohasebe.com"
+      expect(process_external_links("* Turkish: {{t+|tr|köken bilimi}}]], {{t+|tr|etimoloji}}")).to eq "* Turkish: {{t+|tr|köken bilimi}}]], {{t+|tr|etimoloji}}"
     end
   end
@@ -166,30 +168,30 @@ describe "Wp2txt" do
     it "removes brackets and leaving some text" do
       str_before = "{{}}"
       str_after = ""
-      process_template(str_before).should == str_after
+      expect(process_template(str_before)).to eq str_after
       str_before = "{{lang|en|Japan}}"
       str_after  = "Japan"
-      process_template(str_before).should == str_after
+      expect(process_template(str_before)).to eq str_after
       str_before = "{{a|b=c|d=f}}"
       str_after  = "a"
-      process_template(str_before).should == str_after
+      expect(process_template(str_before)).to eq str_after
       str_before = "{{a|b|{{c|d|e}}}}"
       str_after  = "e"
-      process_template(str_before).should == str_after
+      expect(process_template(str_before)).to eq str_after
     end
   end
-  describe "expand_template" do
-    it "gets data corresponding to a given template using mediawiki api" do
-      uri = "http://en.wiktionary.org/w/api.php"
-      template = "{{en-verb}}"
-      word = "kick"
-      expanded = expand_template(uri, template, word)
-      html =<<EOD
-<span class=\"infl-inline\"><b class=\"Latn \" lang=\"en\">kick</b> (''third-person singular simple present'' <span class=\"form-of third-person-singular-form-of\">'''<span class=\"Latn \" lang=\"en\">[[kicks#English|kicks]]</span>'''</span>, ''present participle'' <span class=\"form-of present-participle-form-of\">'''<span class=\"Latn \" lang=\"en\">[[kicking#English|kicking]]</span>'''</span>, ''simple past and past participle'' <span class=\"form-of simple-past-and-participle-form-of\"> '''<span class=\"Latn \" lang=\"en\">[[kicked#English|kicked]]</span>'''</span>)</span>[[Category:English verbs|kick]]
-EOD
-      html.strip!
-      expanded.should == html
-    end
-  end
+#   describe "expand_template" do
+#     it "gets data corresponding to a given template using mediawiki api" do
+#       uri = "http://en.wiktionary.org/w/api.php"
+#       template = "{{en-verb}}"
+#       word = "kick"
+#       expanded = expand_template(uri, template, word)
+#       html =<<EOD
+# <span class=\"infl-inline\"><b class=\"Latn \" lang=\"en\">kick</b> (''third-person singular simple present'' <span class=\"form-of third-person-singular-form-of\">'''<span class=\"Latn \" lang=\"en\">[[kicks#English|kicks]]</span>'''</span>, ''present participle'' <span class=\"form-of present-participle-form-of\">'''<span class=\"Latn \" lang=\"en\">[[kicking#English|kicking]]</span>'''</span>, ''simple past and past participle'' <span class=\"form-of simple-past-and-participle-form-of\"> '''<span class=\"Latn \" lang=\"en\">[[kicked#English|kicked]]</span>'''</span>)</span>[[Category:English verbs|kick]]
+# EOD
+#       html.strip!
+#       expanded.should == html
+#     end
+#   end
 end

data/wp2txt.gemspec CHANGED Viewed

@@ -24,10 +24,10 @@ Gem::Specification.new do |s|
   s.add_dependency "nokogiri"
   s.add_dependency "sanitize"
-  if RUBY_VERSION >= '2.0'
-    s.add_dependency "bzip2-ruby-rb20"
-  else
-    s.add_dependency "bzip2-ruby"
-  end
+  # if RUBY_VERSION >= '2.0'
+  #   s.add_dependency "bzip2-ruby-rb20"
+  # else
+  #   s.add_dependency "bzip2-ruby"
+  # end
   s.add_dependency "trollop"
 end

metadata CHANGED Viewed

@@ -1,111 +1,97 @@
 --- !ruby/object:Gem::Specification
 name: wp2txt
 version: !ruby/object:Gem::Version
-  version: 0.5.4
+  version: 0.6.0
 platform: ruby
 authors:
 - Yoichiro Hasebe
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-12-11 00:00:00.000000000 Z
+date: 2014-10-04 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
 - !ruby/object:Gem::Dependency
   name: nokogiri
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
 - !ruby/object:Gem::Dependency
   name: sanitize
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: bzip2-ruby-rb20
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :runtime
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
 - !ruby/object:Gem::Dependency
   name: trollop
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: '0'
 description: WP2TXT extracts plain text data from Wikipedia dump file (encoded in
@@ -117,7 +103,7 @@ executables:
 extensions: []
 extra_rdoc_files: []
 files:
-- .gitignore
+- ".gitignore"
 - Gemfile
 - LICENSE
 - README.md
@@ -142,17 +128,17 @@ require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
-  - - '>='
+  - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - '>='
+  - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
 rubyforge_project: wp2txt
-rubygems_version: 2.1.11
+rubygems_version: 2.4.1
 signing_key:
 specification_version: 4
 summary: Wikipedia dump to text converter