wp2txt 0.9.5.1 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2aa3c73ab9202aa22974bbb60dad95f10b8abb434cd923fe5f2f6e917f89ac18
4
- data.tar.gz: 790d280ee298ff08c5dde80e355f69a1803b949abe14c81912ec6119f3371d59
3
+ metadata.gz: a15462742cc2912a4dca9e0e4e42e90af4b8f9e09ea29584da94946d0a563872
4
+ data.tar.gz: 0c63c91b90883b4ed69199ef569c7bd467aece538bb1de1f8e7d632e710d6964
5
5
  SHA512:
6
- metadata.gz: 39f16e5df3c22f60ef4c0f3c9fe05c5f9ee0732fa90dd9916dd7bf6ffdc05e991afd67425fa6fdb9661cd206e4e16e0db032c131cb59c9d71b7fd2b668635429
7
- data.tar.gz: b7c700c667220e11b39fd25a91c76609d3f2608599223f8525e0c8b4b03e29fd1c9547ec2bf30117ca4d65aa0cb09db15f9841ce4790fbfe73a16bfeb5cebfc3
6
+ metadata.gz: 22f5c61c0ff6d11cd2c0155ad77940e9b618aea1354826a7b8fc5155289b42daff159be6c48f3f038c8df08753731cad623561cbd8055a10a12ce7feae0566ca
7
+ data.tar.gz: 9b286a09211576f5a397e3e2e46fefbedbf9e95d200f3393b030ede106c9b543fb800c73d3d958ddc5dccad1ba2a30f0b99700af05eef88b142e90c8603e9699
data/README.md CHANGED
@@ -1,104 +1,145 @@
1
- # WP2TXT
1
+ <img src='https://raw.githubusercontent.com/yohasebe/wp2txt/master/image/wp2txt-logo.svg' width="400" />
2
2
 
3
- Wikipedia dump file to text converter that extracts both content and category data
3
+ Text conversion tool to extract content and category data from Wikipedia dump files
4
4
 
5
5
  ## About
6
6
 
7
- WP2TXT extracts plain text data from a Wikipedia dump file (encoded in XML / compressed with Bzip2), removing all MediaWiki markup and other metadata. It was developed for researchers who want easy access to open-source multilingual corpora, but may be used for other purposes as well.
7
+ WP2TXT extracts plain text data from Wikipedia dump files (encoded in XML / compressed with Bzip2), removing all MediaWiki markup and other metadata.
8
8
 
9
- **UPDATE (July 2022)**: Version 0.9.3 adds a new option `category_only`. When this option is enabled, wp2txt will extract only the title and category information of the article. See output examples below.
9
+ **UPDATE (August 2022)**
10
10
 
11
+ 1. A new option `--category-only` has been added. When this option is enabled, only the title and category information of the article is extracted.
12
+ 2. A new option `--summary-only` has been added. If this option is enabled, only the title and text data from the opening paragraphs of the article (= summary) will be extracted.
13
+ 3. The current WP2TXT is *several times faster* than the previous version due to parallel processing of multiple files (the rate of speedup depends on the CPU cores used for processing).
11
14
 
12
- ## Features
15
+ ## Screenshot
16
+
17
+ <img src='https://raw.githubusercontent.com/yohasebe/wp2txt/master/image/screenshot.png' width="700" />
18
+
19
+ - WP2TXT 1.0.0
20
+ - MacBook Pro (2019) 2.3GHz 8Core Intel Core i9
21
+ - enwiki-20220802-pages-articles.xml.bz2 (approx. 20GB)
13
22
 
14
- * Converts Wikipedia dump files in various languages
15
- * Creates output files of specified size
16
- * Can specify text elements to be extracted and converted (page titles, section titles, lists, tables)
17
- * Can extract category information for each article
23
+ In the above environment, the process (decompression, splitting, extraction, and conversion) to obtain the plain text data of the English Wikipedia takes a little over two hours.
24
+
25
+ ## Features
18
26
 
27
+ - Converts Wikipedia dump files in various languages
28
+ - Creates output files of specified size
29
+ - Allows specifying ext elements (page titles, section headers, paragraphs, list items) to be extracted
30
+ - Allows extracting category information of the article
31
+ - Allows extracting opening paragraphs of the article
19
32
 
20
33
  ## Installation
21
34
 
22
35
  $ gem install wp2txt
23
36
 
24
- ## Usage
37
+ ## Preparation
38
+
39
+ First, download the latest Wikipedia dump file for the language of your choice.
40
+
41
+ https://dumps.wikimedia.org/xxwiki/latest/xxwiki-latest-pages-articles.xml.bz2
42
+
43
+ where `xx` is language code such as `en` (English) or `zh` (Chinese). Change it to `ja`, for instance, if you want the latest Japanese Wikipedia dump file.
44
+
45
+ Alternatively, you can also select Wikipedia dump files created on a specific date from [here](http://dumps.wikimedia.org/backup-index.html). Make sure to download a file named in the following format:
46
+
47
+ xxwiki-yyyymmdd-pages-articles.xml.bz2
48
+
49
+ where `xx` is language code such as `en` (English)" or `ko` (Korean), and `yyyymmdd` is the date of creation (e.g. `20220801`).
50
+
51
+ ## Basic Usage
52
+
53
+ Suppose you have a folder with a wikipedia dump file and empty subfolders organized as follows:
54
+
55
+ ```
56
+ .
57
+ ├── enwiki-20220801-pages-articles.xml.bz2
58
+ ├── /xml
59
+ ├── /text
60
+ ├── /category
61
+ └── /summary
62
+ ```
63
+
64
+ ### Decompress and Split
65
+
66
+ The following command will decompress the entire wikipedia data and split it into many small (approximately 10 MB) XML files.
67
+
68
+ $ wp2txt --no-convert -i ./enwiki-20220801-pages-articles.xml.bz2 -o ./xml
69
+
70
+ **Note**: The resulting files are not well-formed XML. They contain part of the orignal XML extracted from the Wikipedia dump file, taking care to ensure that the content within the <page> tag is not split into multiple files.
71
+
72
+ ### Extract plain text from MediaWiki XML
73
+
74
+ $ wp2txt -i ./xml -o ./text
25
75
 
26
- Obtain a Wikipedia dump file (from [here](http://dumps.wikimedia.org/backup-index.html)) with a file name such as:
27
76
 
28
- > `xxwiki-yyyymmdd-pages-articles.xml.bz2`
77
+ ### Extract only category info from MediaWiki XML
29
78
 
30
- where `xx` is language code such as "en (English)" or "ja (Japanese)", and `yyyymmdd` is the date of creation (e.g. 20220720).
79
+ $ wp2txt -g -i ./xml -o ./category
31
80
 
32
- ### Example 1: Basic
81
+ ### Extract opening paragraphs from MediaWiki XML
33
82
 
34
- The following extracts text data, including list items and excluding tables.
83
+ $ wp2txt -s -i ./xml -o ./summary
35
84
 
36
- $ wp2txt -i xxwiki-yyyymmdd-pages-articles.xml.bz2 -o /output_dir
85
+ ### Extract directly from bz2 compressed file
37
86
 
38
- - [Output example (English)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en.txt)
39
- - [Output example (Japanese)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja.txt)
87
+ It is possible (though not recommended) to 1) decompress the dump files, 2) split the data into files, and 3) extract the text just one line of command. You can automatically remove all the intermediate XML files with `-x` option.
40
88
 
41
- ### Example 2: Title and category information only
89
+ $ wp2txt -i ./enwiki-20220801-pages-articles.xml.bz2 -o ./text -x
42
90
 
43
- The following will extract only article titles and the categories to which each article belongs:
91
+ ## Sample Output
44
92
 
45
- $ wp2txt --category-only -i xxwiki-yyyymmdd-pages-articles.xml.bz2 -o /output_dir
93
+ Output contains title, category info, paragraphs
46
94
 
47
- Each line of the output data contains the title and the categories of an article:
95
+ $ wp2txt -i ./input -o /output
48
96
 
49
- > title `TAB` category1`,` category2`,` category3`,` ...
97
+ - [English Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en.txt)
98
+ - [Japanese Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja.txt)
50
99
 
51
- - [Output example (English)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en_categories.txt)
52
- - [Output example (Japanese)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja_categories.txt)
100
+ Output containing title and category only
53
101
 
54
- ### Example 3: Title, category, and summary text only
102
+ $ wp2txt -g -i ./input -o /output
55
103
 
56
- The following will extract only article titles, the categories to which each article belongs, and text blocks before the first heading of the article:
104
+ - [English Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en_category.txt)
105
+ - [Japanese Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja_category.txt)
57
106
 
58
- $ wp2txt --summary-only -i xxwiki-yyyymmdd-pages-articles.xml.bz2 -o /output_dir
107
+ Output containing title, category, and summary
59
108
 
60
- - [Output example (English)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en_summary.txt)
61
- - [Output example (Japanese)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja_summary.txt)
109
+ $ wp2txt -s -i ./input -o /output
62
110
 
111
+ - [English Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en_summary.txt)
112
+ - [Japanese Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja_summary.txt)
63
113
 
64
- ## Options
114
+ ## Command Line Options
65
115
 
66
116
  Command line options are as follows:
67
117
 
68
118
  Usage: wp2txt [options]
69
119
  where [options] are:
70
- --input-file, -i: Wikipedia dump file with .bz2 (compressed) or
71
- .txt (uncompressed) format
72
- --output-dir, -o <s>: Output directory (default: current directory)
73
- --convert, --no-convert, -c: Output in plain text (converting from XML)
74
- (default: true)
75
- --list, --no-list, -l: Show list items in output (default: true)
76
- --heading, --no-heading, -d: Show section titles in output (default: true)
77
- --title, --no-title, -t: Show page titles in output (default: true)
78
- --table, -a: Show table source code in output (default: false)
79
- --inline, -n: leave inline template notations unmodified (default: false)
80
- --multiline, -m: leave multiline template notations unmodified (default: false)
81
- --ref, -r: leave reference notations in the format (default: false)
82
- [ref]...[/ref]
83
- --redirect, -e: Show redirect destination (default: false)
84
- --marker, --no-marker, -k: Show symbols prefixed to list items,
85
- definitions, etc. (Default: true)
86
- --category, -g: Show article category information (default: true)
87
- --category-only, -y: Extract only article title and categories (default: false)
88
- -s, --summary-only: Extract only article title, categories, and summary text before first heading
89
- --file-size, -f <i>: Approximate size (in MB) of each output file
90
- (default: 10)
91
- -u, --num-threads=<i>: Number of threads to be spawned (capped to the number of CPU cores;
92
- set 99 to spawn max num of threads) (default: 4)
93
- --version, -v: Print version and exit
94
- --help, -h: Show this message
120
+ -i, --input Path to compressed file (bz2) or decompressed file (xml), or path to directory containing files of the latter format
121
+ -o, --output-dir=<s> Path to output directory
122
+ -c, --convert, --no-convert Output in plain text (converting from XML) (default: true)
123
+ -a, --category, --no-category Show article category information (default: true)
124
+ -g, --category-only Extract only article title and categories
125
+ -s, --summary-only Extract only article title, categories, and summary text before first heading
126
+ -f, --file-size=<i> Approximate size (in MB) of each output file (default: 10)
127
+ -n, --num-procs Number of proccesses to be run concurrently (default: max num of CPU cores minus two)
128
+ -x, --del-interfile Delete intermediate XML files from output dir
129
+ -t, --title, --no-title Keep page titles in output (default: true)
130
+ -d, --heading, --no-heading Keep section titles in output (default: true)
131
+ -l, --list Keep unprocessed list items in output
132
+ -r, --ref Keep reference notations in the format [ref]...[/ref]
133
+ -e, --redirect Show redirect destination
134
+ -m, --marker, --no-marker Show symbols prefixed to list items, definitions, etc. (Default: true)
135
+ -v, --version Print version and exit
136
+ -h, --help Show this message
95
137
 
96
138
  ## Caveats
97
139
 
98
- * Some data, such as mathematical formulas and computer source code, will not be converted correctly.
140
+ * Some data, such as mathematical formulas and computer source code, will not be converted correctly.
99
141
  * Some text data may not be extracted correctly for various reasons (incorrect matching of begin/end tags, language-specific formatting rules, etc.).
100
142
  * The conversion process can take longer than expected. When dealing with a huge data set such as the English Wikipedia on a low-spec environment, it can take several hours or more.
101
- * WP2TXT, by the nature of its task, requires a lot of machine power and consumes a large amount of memory/storage resources. Therefore, there is a possibility that the process may stop unexpectedly. In the worst case, the process may even freeze without terminating successfully. Please understand this and use at your own risk.
102
143
 
103
144
  ## Useful Links
104
145
 
data/bin/wp2txt CHANGED
@@ -11,133 +11,181 @@ DOCDIR = File.join(File.dirname(__FILE__), '..', 'doc')
11
11
  require 'wp2txt'
12
12
  require 'wp2txt/utils'
13
13
  require 'wp2txt/version'
14
+ require 'etc'
14
15
  require 'optimist'
16
+ require 'parallel'
17
+ require 'pastel'
18
+ require 'tty-spinner'
15
19
 
16
20
  include Wp2txt
17
21
 
18
22
  opts = Optimist::options do
19
- version Wp2txt::VERSION
20
- banner <<-EOS
23
+ version Wp2txt::VERSION
24
+ banner <<-EOS
21
25
  WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
22
26
 
23
27
  Usage: wp2txt [options]
24
28
  where [options] are:
25
29
  EOS
26
30
 
27
- opt :input_file, "Wikipedia dump file with .bz2 (compressed) or .txt (uncompressed) format", :required => true
28
- opt :output_dir, "Output directory", :default => Dir::pwd, :type => String
29
- opt :convert, "Output in plain text (converting from XML)", :default => true
30
- opt :list, "Show list items in output", :default => false
31
- opt :heading, "Show section titles in output", :default => true, :short => "-d"
32
- opt :title, "Show page titles in output", :default => true
33
- opt :table, "Show table source code in output", :default => false
34
- opt :inline, "leave inline template notations as they are", :default => false
35
- opt :multiline, "leave multiline template notations as they are", :default => false
36
- opt :ref, "leave reference notations in the format [ref]...[/ref]", :default => false
37
- opt :redirect, "Show redirect destination", :default => false
38
- opt :marker, "Show symbols prefixed to list items, definitions, etc.", :default => true
39
- opt :category, "Show article category information", :default => true
40
- opt :category_only, "Extract only article title and categories", :default => false
41
- opt :summary_only, "Extract only article title, categories, and summary text before first heading", :default => false
42
- opt :file_size, "Approximate size (in MB) of each output file", :default => 10
43
- opt :num_threads, "Number of threads to be spawned (capped to the number of CPU cores; set 99 to spawn max num of threads)", :default => 4
31
+ opt :input, "Path to compressed file (bz2) or decompressed file (xml), or path to directory containing files of the latter format", :required => true, :short => "-i"
32
+ opt :output_dir, "Path to output directory", :default => Dir::pwd, :type => String, :short => "-o"
33
+ opt :convert, "Output in plain text (converting from XML)", :default => true, :short => "-c"
34
+ opt :category, "Show article category information", :default => true, :short => "-a"
35
+ opt :category_only, "Extract only article title and categories", :default => false, :short => "-g"
36
+ opt :summary_only, "Extract only article title, categories, and summary text before first heading", :default => false, :short => "-s"
37
+ opt :file_size, "Approximate size (in MB) of each output file", :default => 10, :short => "-f"
38
+ opt :num_procs, "Number of proccesses to be run concurrently (default: max num of CPU cores minus two)", :short => "-n"
39
+ opt :del_interfile, "Delete intermediate XML files from output dir", :short => "-x", :default => false
40
+ opt :title, "Keep page titles in output", :default => true, :short => "-t"
41
+ opt :heading, "Keep section titles in output", :default => true, :short => "-d"
42
+ opt :list, "Keep unprocessed list items in output", :default => false, :short => "-l"
43
+ opt :ref, "Keep reference notations in the format [ref]...[/ref]", :default => false, :short => "-r"
44
+ opt :redirect, "Show redirect destination", :default => false, :short => "-e"
45
+ opt :marker, "Show symbols prefixed to list items, definitions, etc.", :default => true, :short => "-m"
44
46
  end
47
+
45
48
  Optimist::die :size, "must be larger than 0" unless opts[:file_size] >= 0
46
49
  Optimist::die :output_dir, "must exist" unless File.exist?(opts[:output_dir])
47
50
 
51
+ pastel = Pastel.new
52
+
48
53
  input_file = ARGV[0]
49
54
  output_dir = opts[:output_dir]
50
55
  tfile_size = opts[:file_size]
51
- num_threads = opts[:num_threads]
56
+ num_processors = Etc.nprocessors
57
+ if opts[:num_procs] && opts[:num_procs].to_i <= num_processors
58
+ num_processes = opts[:num_procs]
59
+ else
60
+ num_processes = num_processors - 2
61
+ end
62
+ num_processes = 1 if num_processes < 1
63
+
52
64
  convert = opts[:convert]
53
65
  strip_tmarker = opts[:marker] ? false : true
54
- opt_array = [:title, :list, :heading, :table, :redirect, :multiline, :category, :category_only, :summary_only]
66
+ opt_array = [:title,
67
+ :list,
68
+ :heading,
69
+ :table,
70
+ :redirect,
71
+ :multiline,
72
+ :category,
73
+ :category_only,
74
+ :summary_only,
75
+ :del_interfile]
76
+
55
77
  $leave_inline_template = true if opts[:inline]
56
78
  $leave_ref = true if opts[:ref]
79
+
57
80
  config = {}
58
81
  opt_array.each do |opt|
59
82
  config[opt] = opts[opt]
60
83
  end
61
84
 
62
- parent = Wp2txt::CmdProgbar.new
63
- wpconv = Wp2txt::Runner.new(parent, input_file, output_dir, tfile_size, num_threads, convert, strip_tmarker)
64
-
65
- wpconv.extract_text do |article|
66
- format_wiki!(article.title)
67
-
68
- if config[:category_only]
69
- title = "#{article.title}\t"
70
- contents = article.categories.join(", ")
71
- contents << "\n"
72
- elsif config[:category] && !article.categories.empty?
73
- title = "\n[[#{article.title}]]\n\n"
74
- contents = "\nCATEGORIES: "
75
- contents << article.categories.join(", ")
76
- contents << "\n\n"
77
- else
78
- title = "\n[[#{article.title}]]\n\n"
79
- contents = ""
80
- end
85
+ if File::ftype(input_file) == "directory"
86
+ input_files = Dir.glob("#{input_file}/*.xml")
87
+ else
88
+ puts ""
89
+ puts pastel.green.bold("Preprocessing")
90
+ puts "Decompressing and splitting the original dump file."
91
+ puts pastel.underline("This may take a while. Please be patient!")
81
92
 
82
- unless config[:category_only]
83
- article.elements.each do |e|
84
- case e.first
85
- when :mw_heading
86
- break if config[:summary_only]
87
- next if !config[:heading]
88
- format_wiki!(e.last)
89
- line = e.last
90
- line << "+HEADING+" if $DEBUG_MODE
91
- when :mw_paragraph
92
- format_wiki!(e.last)
93
- line = e.last + "\n"
94
- line << "+PARAGRAPH+" if $DEBUG_MODE
95
- when :mw_table, :mw_htable
96
- next if !config[:table]
97
- line = e.last
98
- line << "+TABLE+" if $DEBUG_MODE
99
- when :mw_pre
100
- next if !config[:pre]
101
- line = e.last
102
- line << "+PRE+" if $DEBUG_MODE
103
- when :mw_quote
104
- line = e.last
105
- line << "+QUOTE+" if $DEBUG_MODE
106
- when :mw_unordered, :mw_ordered, :mw_definition
107
- next if !config[:list]
108
- line = e.last
109
- line << "+LIST+" if $DEBUG_MODE
110
- when :mw_ml_template
111
- next if !config[:multiline]
112
- line = e.last
113
- line << "+MLTEMPLATE+" if $DEBUG_MODE
114
- when :mw_redirect
115
- next if !config[:redirect]
116
- line = e.last
117
- line << "+REDIRECT+" if $DEBUG_MODE
118
- line << "\n\n"
119
- when :mw_isolated_template
120
- next if !config[:multiline]
121
- line = e.last
122
- line << "+ISOLATED_TEMPLATE+" if $DEBUG_MODE
123
- when :mw_isolated_tag
124
- next
125
- else
126
- if $DEBUG_MODE
127
- # format_wiki!(e.last)
93
+ spinner = TTY::Spinner.new(":spinner", format: :arrow_pulse, hide_cursor: true, interval: 5)
94
+ spinner.auto_spin
95
+ wpsplitter = Wp2txt::Splitter.new(input_file, output_dir, tfile_size)
96
+ wpsplitter.split_file
97
+ spinner.stop(pastel.blue.bold("Done!")) # Stop animation
98
+ exit if !convert
99
+ input_files = Dir.glob("#{output_dir}/*.xml")
100
+ end
101
+
102
+ puts ""
103
+ puts pastel.red.bold("Converting")
104
+ puts "Number of files being processed: " + pastel.bold("#{input_files.size}")
105
+ puts "Number of CPU cores being used: " + pastel.bold("#{num_processes}")
106
+
107
+ Parallel.map(input_files, progress: pastel.magenta.bold("WP2TXT"), in_processes: num_processes) do |input_file|
108
+ wpconv = Wp2txt::Runner.new(input_file, output_dir, strip_tmarker, config[:del_interfile])
109
+ wpconv.extract_text do |article|
110
+ format_wiki!(article.title)
111
+
112
+ if config[:category_only]
113
+ title = "#{article.title}\t"
114
+ contents = article.categories.join(", ")
115
+ contents << "\n"
116
+ elsif config[:category] && !article.categories.empty?
117
+ title = "\n[[#{article.title}]]\n\n"
118
+ contents = "\nCATEGORIES: "
119
+ contents << article.categories.join(", ")
120
+ contents << "\n\n"
121
+ else
122
+ title = "\n[[#{article.title}]]\n\n"
123
+ contents = ""
124
+ end
125
+
126
+ unless config[:category_only]
127
+ article.elements.each do |e|
128
+ case e.first
129
+ when :mw_heading
130
+ break if config[:summary_only]
131
+ next if !config[:heading]
132
+ format_wiki!(e.last)
128
133
  line = e.last
129
- line << "+OTHER+"
130
- else
134
+ line << "+HEADING+" if $DEBUG_MODE
135
+ when :mw_paragraph
136
+ format_wiki!(e.last)
137
+ line = e.last + "\n"
138
+ line << "+PARAGRAPH+" if $DEBUG_MODE
139
+ when :mw_table, :mw_htable
140
+ next if !config[:table]
141
+ line = e.last
142
+ line << "+TABLE+" if $DEBUG_MODE
143
+ when :mw_pre
144
+ next if !config[:pre]
145
+ line = e.last
146
+ line << "+PRE+" if $DEBUG_MODE
147
+ when :mw_quote
148
+ line = e.last
149
+ line << "+QUOTE+" if $DEBUG_MODE
150
+ when :mw_unordered, :mw_ordered, :mw_definition
151
+ next if !config[:list]
152
+ line = e.last
153
+ line << "+LIST+" if $DEBUG_MODE
154
+ when :mw_ml_template
155
+ next if !config[:multiline]
156
+ line = e.last
157
+ line << "+MLTEMPLATE+" if $DEBUG_MODE
158
+ when :mw_redirect
159
+ next if !config[:redirect]
160
+ line = e.last
161
+ line << "+REDIRECT+" if $DEBUG_MODE
162
+ line << "\n\n"
163
+ when :mw_isolated_template
164
+ next if !config[:multiline]
165
+ line = e.last
166
+ line << "+ISOLATED_TEMPLATE+" if $DEBUG_MODE
167
+ when :mw_isolated_tag
131
168
  next
169
+ else
170
+ if $DEBUG_MODE
171
+ # format_wiki!(e.last)
172
+ line = e.last
173
+ line << "+OTHER+"
174
+ else
175
+ next
176
+ end
132
177
  end
178
+ contents << line << "\n"
133
179
  end
134
- contents << line << "\n"
135
180
  end
136
- end
137
-
138
- if /\A[\s ]*\z/m =~ contents
139
- result = ""
140
- else
141
- result = config[:title] ? title << contents : contents
181
+
182
+ if /\A[\s ]*\z/m =~ contents
183
+ result = ""
184
+ else
185
+ result = config[:title] ? title << contents : contents
186
+ end
142
187
  end
143
188
  end
189
+
190
+ puts pastel.blue.bold("Complete!")
191
+