wp2txt 0.9.4 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/README.md +112 -54
- data/bin/wp2txt +143 -94
- data/data/output_samples/testdata_en.txt +11384 -37458
- data/data/output_samples/testdata_en_category.txt +132 -0
- data/data/output_samples/testdata_en_summary.txt +1376 -0
- data/data/output_samples/testdata_ja.txt +18074 -4682
- data/data/output_samples/testdata_ja_category.txt +206 -0
- data/data/output_samples/testdata_ja_summary.txt +1560 -0
- data/data/testdata_en.bz2 +0 -0
- data/data/testdata_ja.bz2 +0 -0
- data/image/screenshot.png +0 -0
- data/image/wp2txt-logo.svg +16 -0
- data/image/wp2txt.svg +31 -0
- data/lib/wp2txt/article.rb +3 -4
- data/lib/wp2txt/utils.rb +115 -63
- data/lib/wp2txt/version.rb +1 -1
- data/lib/wp2txt.rb +118 -148
- data/spec/utils_spec.rb +3 -21
- data/wp2txt.gemspec +4 -0
- metadata +52 -9
- data/bin/benchmark.rb +0 -76
- data/data/output_samples/testdata_en_categories.txt +0 -207
- data/data/output_samples/testdata_ja_categories.txt +0 -48
- data/lib/wp2txt/mw_api.rb +0 -65
- data/lib/wp2txt/progressbar.rb +0 -305
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a15462742cc2912a4dca9e0e4e42e90af4b8f9e09ea29584da94946d0a563872
|
4
|
+
data.tar.gz: 0c63c91b90883b4ed69199ef569c7bd467aece538bb1de1f8e7d632e710d6964
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 22f5c61c0ff6d11cd2c0155ad77940e9b618aea1354826a7b8fc5155289b42daff159be6c48f3f038c8df08753731cad623561cbd8055a10a12ce7feae0566ca
|
7
|
+
data.tar.gz: 9b286a09211576f5a397e3e2e46fefbedbf9e95d200f3393b030ede106c9b543fb800c73d3d958ddc5dccad1ba2a30f0b99700af05eef88b142e90c8603e9699
|
data/.gitignore
CHANGED
data/README.md
CHANGED
@@ -1,103 +1,161 @@
|
|
1
|
-
|
1
|
+
<img src='https://raw.githubusercontent.com/yohasebe/wp2txt/master/image/wp2txt-logo.svg' width="400" />
|
2
2
|
|
3
|
-
|
3
|
+
Text conversion tool to extract content and category data from Wikipedia dump files
|
4
4
|
|
5
5
|
## About
|
6
6
|
|
7
|
-
WP2TXT extracts plain text data from Wikipedia dump
|
7
|
+
WP2TXT extracts plain text data from Wikipedia dump files (encoded in XML / compressed with Bzip2), removing all MediaWiki markup and other metadata.
|
8
8
|
|
9
|
-
**UPDATE (
|
9
|
+
**UPDATE (August 2022)**
|
10
|
+
|
11
|
+
1. A new option `--category-only` has been added. When this option is enabled, only the title and category information of the article is extracted.
|
12
|
+
2. A new option `--summary-only` has been added. If this option is enabled, only the title and text data from the opening paragraphs of the article (= summary) will be extracted.
|
13
|
+
3. The current WP2TXT is *several times faster* than the previous version due to parallel processing of multiple files (the rate of speedup depends on the CPU cores used for processing).
|
14
|
+
|
15
|
+
## Screenshot
|
16
|
+
|
17
|
+
<img src='https://raw.githubusercontent.com/yohasebe/wp2txt/master/image/screenshot.png' width="700" />
|
18
|
+
|
19
|
+
- WP2TXT 1.0.0
|
20
|
+
- MacBook Pro (2019) 2.3GHz 8Core Intel Core i9
|
21
|
+
- enwiki-20220802-pages-articles.xml.bz2 (approx. 20GB)
|
22
|
+
|
23
|
+
In the above environment, the process (decompression, splitting, extraction, and conversion) to obtain the plain text data of the English Wikipedia takes a little over two hours.
|
10
24
|
|
11
25
|
## Features
|
12
26
|
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
27
|
+
- Converts Wikipedia dump files in various languages
|
28
|
+
- Creates output files of specified size
|
29
|
+
- Allows specifying ext elements (page titles, section headers, paragraphs, list items) to be extracted
|
30
|
+
- Allows extracting category information of the article
|
31
|
+
- Allows extracting opening paragraphs of the article
|
17
32
|
|
18
33
|
## Installation
|
19
34
|
|
20
35
|
$ gem install wp2txt
|
21
36
|
|
22
|
-
##
|
37
|
+
## Preparation
|
38
|
+
|
39
|
+
First, download the latest Wikipedia dump file for the language of your choice.
|
40
|
+
|
41
|
+
https://dumps.wikimedia.org/xxwiki/latest/xxwiki-latest-pages-articles.xml.bz2
|
23
42
|
|
24
|
-
|
43
|
+
where `xx` is language code such as `en` (English) or `zh` (Chinese). Change it to `ja`, for instance, if you want the latest Japanese Wikipedia dump file.
|
44
|
+
|
45
|
+
Alternatively, you can also select Wikipedia dump files created on a specific date from [here](http://dumps.wikimedia.org/backup-index.html). Make sure to download a file named in the following format:
|
25
46
|
|
26
47
|
xxwiki-yyyymmdd-pages-articles.xml.bz2
|
27
48
|
|
28
|
-
where `xx` is language code such as
|
49
|
+
where `xx` is language code such as `en` (English)" or `ko` (Korean), and `yyyymmdd` is the date of creation (e.g. `20220801`).
|
50
|
+
|
51
|
+
## Basic Usage
|
52
|
+
|
53
|
+
Suppose you have a folder with a wikipedia dump file and empty subfolders organized as follows:
|
54
|
+
|
55
|
+
```
|
56
|
+
.
|
57
|
+
├── enwiki-20220801-pages-articles.xml.bz2
|
58
|
+
├── /xml
|
59
|
+
├── /text
|
60
|
+
├── /category
|
61
|
+
└── /summary
|
62
|
+
```
|
63
|
+
|
64
|
+
### Decompress and Split
|
65
|
+
|
66
|
+
The following command will decompress the entire wikipedia data and split it into many small (approximately 10 MB) XML files.
|
67
|
+
|
68
|
+
$ wp2txt --no-convert -i ./enwiki-20220801-pages-articles.xml.bz2 -o ./xml
|
69
|
+
|
70
|
+
**Note**: The resulting files are not well-formed XML. They contain part of the orignal XML extracted from the Wikipedia dump file, taking care to ensure that the content within the <page> tag is not split into multiple files.
|
71
|
+
|
72
|
+
### Extract plain text from MediaWiki XML
|
73
|
+
|
74
|
+
$ wp2txt -i ./xml -o ./text
|
75
|
+
|
76
|
+
|
77
|
+
### Extract only category info from MediaWiki XML
|
78
|
+
|
79
|
+
$ wp2txt -g -i ./xml -o ./category
|
80
|
+
|
81
|
+
### Extract opening paragraphs from MediaWiki XML
|
82
|
+
|
83
|
+
$ wp2txt -s -i ./xml -o ./summary
|
84
|
+
|
85
|
+
### Extract directly from bz2 compressed file
|
86
|
+
|
87
|
+
It is possible (though not recommended) to 1) decompress the dump files, 2) split the data into files, and 3) extract the text just one line of command. You can automatically remove all the intermediate XML files with `-x` option.
|
88
|
+
|
89
|
+
$ wp2txt -i ./enwiki-20220801-pages-articles.xml.bz2 -o ./text -x
|
90
|
+
|
91
|
+
## Sample Output
|
92
|
+
|
93
|
+
Output contains title, category info, paragraphs
|
29
94
|
|
30
|
-
|
95
|
+
$ wp2txt -i ./input -o /output
|
31
96
|
|
32
|
-
|
97
|
+
- [English Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en.txt)
|
98
|
+
- [Japanese Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja.txt)
|
33
99
|
|
34
|
-
|
100
|
+
Output containing title and category only
|
35
101
|
|
36
|
-
-
|
37
|
-
- [Output example (Japanese)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja.txt)
|
102
|
+
$ wp2txt -g -i ./input -o /output
|
38
103
|
|
39
|
-
|
104
|
+
- [English Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en_category.txt)
|
105
|
+
- [Japanese Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja_category.txt)
|
40
106
|
|
41
|
-
|
107
|
+
Output containing title, category, and summary
|
42
108
|
|
43
|
-
$ wp2txt
|
109
|
+
$ wp2txt -s -i ./input -o /output
|
44
110
|
|
45
|
-
- [
|
46
|
-
- [
|
111
|
+
- [English Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en_summary.txt)
|
112
|
+
- [Japanese Wikipedia](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja_summary.txt)
|
47
113
|
|
48
|
-
## Options
|
114
|
+
## Command Line Options
|
49
115
|
|
50
116
|
Command line options are as follows:
|
51
117
|
|
52
118
|
Usage: wp2txt [options]
|
53
119
|
where [options] are:
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
--marker, --no-marker
|
69
|
-
|
70
|
-
|
71
|
-
--category-only, -y: Extract only article title and categories (default: false)
|
72
|
-
--file-size, -f <i>: Approximate size (in MB) of each output file
|
73
|
-
(default: 10)
|
74
|
-
-u, --num-threads=<i>: Number of threads to be spawned (capped to the number of CPU cores;
|
75
|
-
set 99 to spawn max num of threads) (default: 4)
|
76
|
-
--version, -v: Print version and exit
|
77
|
-
--help, -h: Show this message
|
120
|
+
-i, --input Path to compressed file (bz2) or decompressed file (xml), or path to directory containing files of the latter format
|
121
|
+
-o, --output-dir=<s> Path to output directory
|
122
|
+
-c, --convert, --no-convert Output in plain text (converting from XML) (default: true)
|
123
|
+
-a, --category, --no-category Show article category information (default: true)
|
124
|
+
-g, --category-only Extract only article title and categories
|
125
|
+
-s, --summary-only Extract only article title, categories, and summary text before first heading
|
126
|
+
-f, --file-size=<i> Approximate size (in MB) of each output file (default: 10)
|
127
|
+
-n, --num-procs Number of proccesses to be run concurrently (default: max num of CPU cores minus two)
|
128
|
+
-x, --del-interfile Delete intermediate XML files from output dir
|
129
|
+
-t, --title, --no-title Keep page titles in output (default: true)
|
130
|
+
-d, --heading, --no-heading Keep section titles in output (default: true)
|
131
|
+
-l, --list Keep unprocessed list items in output
|
132
|
+
-r, --ref Keep reference notations in the format [ref]...[/ref]
|
133
|
+
-e, --redirect Show redirect destination
|
134
|
+
-m, --marker, --no-marker Show symbols prefixed to list items, definitions, etc. (Default: true)
|
135
|
+
-v, --version Print version and exit
|
136
|
+
-h, --help Show this message
|
78
137
|
|
79
138
|
## Caveats
|
80
139
|
|
81
|
-
*
|
82
|
-
*
|
83
|
-
*
|
84
|
-
* Because of nature of the task, WP2TXT needs much machine power and consumes a lot of memory/storage resources. The process thus could halt unexpectedly. It may even get stuck, in the worst case, without getting gracefully terminated. Please understand this and use the software __at your own risk__.
|
140
|
+
* Some data, such as mathematical formulas and computer source code, will not be converted correctly.
|
141
|
+
* Some text data may not be extracted correctly for various reasons (incorrect matching of begin/end tags, language-specific formatting rules, etc.).
|
142
|
+
* The conversion process can take longer than expected. When dealing with a huge data set such as the English Wikipedia on a low-spec environment, it can take several hours or more.
|
85
143
|
|
86
|
-
|
144
|
+
## Useful Links
|
87
145
|
|
88
146
|
* [Wikipedia Database backup dumps](http://dumps.wikimedia.org/backup-index.html)
|
89
147
|
|
90
|
-
|
148
|
+
## Author
|
91
149
|
|
92
150
|
* Yoichiro Hasebe (<yohasebe@gmail.com>)
|
93
151
|
|
94
|
-
|
152
|
+
## References
|
95
153
|
|
96
154
|
The author will appreciate your mentioning one of these in your research.
|
97
155
|
|
98
156
|
* Yoichiro HASEBE. 2006. [Method for using Wikipedia as Japanese corpus.](http://ci.nii.ac.jp/naid/110006226727) _Doshisha Studies in Language and Culture_ 9(2), 373-403.
|
99
157
|
* 長谷部陽一郎. 2006. [Wikipedia日本語版をコーパスとして用いた言語研究の手法](http://ci.nii.ac.jp/naid/110006226727). 『言語文化』9(2), 373-403.
|
100
158
|
|
101
|
-
|
159
|
+
## License
|
102
160
|
|
103
161
|
This software is distributed under the MIT License. Please see the LICENSE file.
|
data/bin/wp2txt
CHANGED
@@ -11,132 +11,181 @@ DOCDIR = File.join(File.dirname(__FILE__), '..', 'doc')
|
|
11
11
|
require 'wp2txt'
|
12
12
|
require 'wp2txt/utils'
|
13
13
|
require 'wp2txt/version'
|
14
|
+
require 'etc'
|
14
15
|
require 'optimist'
|
16
|
+
require 'parallel'
|
17
|
+
require 'pastel'
|
18
|
+
require 'tty-spinner'
|
15
19
|
|
16
20
|
include Wp2txt
|
17
21
|
|
18
22
|
opts = Optimist::options do
|
19
|
-
|
20
|
-
|
23
|
+
version Wp2txt::VERSION
|
24
|
+
banner <<-EOS
|
21
25
|
WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
|
22
26
|
|
23
27
|
Usage: wp2txt [options]
|
24
28
|
where [options] are:
|
25
29
|
EOS
|
26
30
|
|
27
|
-
opt :
|
28
|
-
opt :output_dir, "
|
29
|
-
opt :convert, "Output in plain text (converting from XML)", :default => true
|
30
|
-
opt :
|
31
|
-
opt :
|
32
|
-
opt :
|
33
|
-
opt :
|
34
|
-
opt :
|
35
|
-
opt :
|
36
|
-
opt :
|
37
|
-
opt :
|
38
|
-
opt :
|
39
|
-
opt :
|
40
|
-
opt :
|
41
|
-
opt :
|
42
|
-
opt :num_threads, "Number of threads to be spawned (capped to the number of CPU cores; set 99 to spawn max num of threads)", :default => 4
|
31
|
+
opt :input, "Path to compressed file (bz2) or decompressed file (xml), or path to directory containing files of the latter format", :required => true, :short => "-i"
|
32
|
+
opt :output_dir, "Path to output directory", :default => Dir::pwd, :type => String, :short => "-o"
|
33
|
+
opt :convert, "Output in plain text (converting from XML)", :default => true, :short => "-c"
|
34
|
+
opt :category, "Show article category information", :default => true, :short => "-a"
|
35
|
+
opt :category_only, "Extract only article title and categories", :default => false, :short => "-g"
|
36
|
+
opt :summary_only, "Extract only article title, categories, and summary text before first heading", :default => false, :short => "-s"
|
37
|
+
opt :file_size, "Approximate size (in MB) of each output file", :default => 10, :short => "-f"
|
38
|
+
opt :num_procs, "Number of proccesses to be run concurrently (default: max num of CPU cores minus two)", :short => "-n"
|
39
|
+
opt :del_interfile, "Delete intermediate XML files from output dir", :short => "-x", :default => false
|
40
|
+
opt :title, "Keep page titles in output", :default => true, :short => "-t"
|
41
|
+
opt :heading, "Keep section titles in output", :default => true, :short => "-d"
|
42
|
+
opt :list, "Keep unprocessed list items in output", :default => false, :short => "-l"
|
43
|
+
opt :ref, "Keep reference notations in the format [ref]...[/ref]", :default => false, :short => "-r"
|
44
|
+
opt :redirect, "Show redirect destination", :default => false, :short => "-e"
|
45
|
+
opt :marker, "Show symbols prefixed to list items, definitions, etc.", :default => true, :short => "-m"
|
43
46
|
end
|
47
|
+
|
44
48
|
Optimist::die :size, "must be larger than 0" unless opts[:file_size] >= 0
|
45
49
|
Optimist::die :output_dir, "must exist" unless File.exist?(opts[:output_dir])
|
46
50
|
|
51
|
+
pastel = Pastel.new
|
52
|
+
|
47
53
|
input_file = ARGV[0]
|
48
54
|
output_dir = opts[:output_dir]
|
49
55
|
tfile_size = opts[:file_size]
|
50
|
-
|
56
|
+
num_processors = Etc.nprocessors
|
57
|
+
if opts[:num_procs] && opts[:num_procs].to_i <= num_processors
|
58
|
+
num_processes = opts[:num_procs]
|
59
|
+
else
|
60
|
+
num_processes = num_processors - 2
|
61
|
+
end
|
62
|
+
num_processes = 1 if num_processes < 1
|
63
|
+
|
51
64
|
convert = opts[:convert]
|
52
65
|
strip_tmarker = opts[:marker] ? false : true
|
53
|
-
opt_array = [:title,
|
66
|
+
opt_array = [:title,
|
67
|
+
:list,
|
68
|
+
:heading,
|
69
|
+
:table,
|
70
|
+
:redirect,
|
71
|
+
:multiline,
|
72
|
+
:category,
|
73
|
+
:category_only,
|
74
|
+
:summary_only,
|
75
|
+
:del_interfile]
|
76
|
+
|
54
77
|
$leave_inline_template = true if opts[:inline]
|
55
78
|
$leave_ref = true if opts[:ref]
|
56
|
-
|
79
|
+
|
57
80
|
config = {}
|
58
81
|
opt_array.each do |opt|
|
59
82
|
config[opt] = opts[opt]
|
60
83
|
end
|
61
84
|
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
title = "#{article.title}\t"
|
70
|
-
contents = article.categories.join(", ")
|
71
|
-
contents << "\n"
|
72
|
-
elsif opts[:category] && !article.categories.empty?
|
73
|
-
title = "\n[[#{article.title}]]\n\n"
|
74
|
-
contents = "\nCATEGORIES: "
|
75
|
-
contents << article.categories.join(", ")
|
76
|
-
contents << "\n\n"
|
77
|
-
else
|
78
|
-
title = "\n[[#{article.title}]]\n\n"
|
79
|
-
contents = ""
|
80
|
-
end
|
85
|
+
if File::ftype(input_file) == "directory"
|
86
|
+
input_files = Dir.glob("#{input_file}/*.xml")
|
87
|
+
else
|
88
|
+
puts ""
|
89
|
+
puts pastel.green.bold("Preprocessing")
|
90
|
+
puts "Decompressing and splitting the original dump file."
|
91
|
+
puts pastel.underline("This may take a while. Please be patient!")
|
81
92
|
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
when :mw_isolated_tag
|
123
|
-
next
|
124
|
-
else
|
125
|
-
if $DEBUG_MODE
|
126
|
-
# format_wiki!(e.last)
|
93
|
+
spinner = TTY::Spinner.new(":spinner", format: :arrow_pulse, hide_cursor: true, interval: 5)
|
94
|
+
spinner.auto_spin
|
95
|
+
wpsplitter = Wp2txt::Splitter.new(input_file, output_dir, tfile_size)
|
96
|
+
wpsplitter.split_file
|
97
|
+
spinner.stop(pastel.blue.bold("Done!")) # Stop animation
|
98
|
+
exit if !convert
|
99
|
+
input_files = Dir.glob("#{output_dir}/*.xml")
|
100
|
+
end
|
101
|
+
|
102
|
+
puts ""
|
103
|
+
puts pastel.red.bold("Converting")
|
104
|
+
puts "Number of files being processed: " + pastel.bold("#{input_files.size}")
|
105
|
+
puts "Number of CPU cores being used: " + pastel.bold("#{num_processes}")
|
106
|
+
|
107
|
+
Parallel.map(input_files, progress: pastel.magenta.bold("WP2TXT"), in_processes: num_processes) do |input_file|
|
108
|
+
wpconv = Wp2txt::Runner.new(input_file, output_dir, strip_tmarker, config[:del_interfile])
|
109
|
+
wpconv.extract_text do |article|
|
110
|
+
format_wiki!(article.title)
|
111
|
+
|
112
|
+
if config[:category_only]
|
113
|
+
title = "#{article.title}\t"
|
114
|
+
contents = article.categories.join(", ")
|
115
|
+
contents << "\n"
|
116
|
+
elsif config[:category] && !article.categories.empty?
|
117
|
+
title = "\n[[#{article.title}]]\n\n"
|
118
|
+
contents = "\nCATEGORIES: "
|
119
|
+
contents << article.categories.join(", ")
|
120
|
+
contents << "\n\n"
|
121
|
+
else
|
122
|
+
title = "\n[[#{article.title}]]\n\n"
|
123
|
+
contents = ""
|
124
|
+
end
|
125
|
+
|
126
|
+
unless config[:category_only]
|
127
|
+
article.elements.each do |e|
|
128
|
+
case e.first
|
129
|
+
when :mw_heading
|
130
|
+
break if config[:summary_only]
|
131
|
+
next if !config[:heading]
|
132
|
+
format_wiki!(e.last)
|
127
133
|
line = e.last
|
128
|
-
line << "+
|
129
|
-
|
134
|
+
line << "+HEADING+" if $DEBUG_MODE
|
135
|
+
when :mw_paragraph
|
136
|
+
format_wiki!(e.last)
|
137
|
+
line = e.last + "\n"
|
138
|
+
line << "+PARAGRAPH+" if $DEBUG_MODE
|
139
|
+
when :mw_table, :mw_htable
|
140
|
+
next if !config[:table]
|
141
|
+
line = e.last
|
142
|
+
line << "+TABLE+" if $DEBUG_MODE
|
143
|
+
when :mw_pre
|
144
|
+
next if !config[:pre]
|
145
|
+
line = e.last
|
146
|
+
line << "+PRE+" if $DEBUG_MODE
|
147
|
+
when :mw_quote
|
148
|
+
line = e.last
|
149
|
+
line << "+QUOTE+" if $DEBUG_MODE
|
150
|
+
when :mw_unordered, :mw_ordered, :mw_definition
|
151
|
+
next if !config[:list]
|
152
|
+
line = e.last
|
153
|
+
line << "+LIST+" if $DEBUG_MODE
|
154
|
+
when :mw_ml_template
|
155
|
+
next if !config[:multiline]
|
156
|
+
line = e.last
|
157
|
+
line << "+MLTEMPLATE+" if $DEBUG_MODE
|
158
|
+
when :mw_redirect
|
159
|
+
next if !config[:redirect]
|
160
|
+
line = e.last
|
161
|
+
line << "+REDIRECT+" if $DEBUG_MODE
|
162
|
+
line << "\n\n"
|
163
|
+
when :mw_isolated_template
|
164
|
+
next if !config[:multiline]
|
165
|
+
line = e.last
|
166
|
+
line << "+ISOLATED_TEMPLATE+" if $DEBUG_MODE
|
167
|
+
when :mw_isolated_tag
|
130
168
|
next
|
169
|
+
else
|
170
|
+
if $DEBUG_MODE
|
171
|
+
# format_wiki!(e.last)
|
172
|
+
line = e.last
|
173
|
+
line << "+OTHER+"
|
174
|
+
else
|
175
|
+
next
|
176
|
+
end
|
131
177
|
end
|
178
|
+
contents << line << "\n"
|
132
179
|
end
|
133
|
-
contents << line << "\n"
|
134
180
|
end
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
140
|
-
|
181
|
+
|
182
|
+
if /\A[\s ]*\z/m =~ contents
|
183
|
+
result = ""
|
184
|
+
else
|
185
|
+
result = config[:title] ? title << contents : contents
|
186
|
+
end
|
141
187
|
end
|
142
188
|
end
|
189
|
+
|
190
|
+
puts pastel.blue.bold("Complete!")
|
191
|
+
|