wp2txt 0.5.1 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +2 -0
- data/lib/wp2txt/version.rb +1 -1
- data/wp2txt.gemspec +2 -1
- metadata +18 -2
data/README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
Wikipedia dump file to text converter
|
|
4
4
|
|
|
5
|
+
CAUTION: This software is on an experimental stage. Use with care!
|
|
6
|
+
|
|
5
7
|
### About ###
|
|
6
8
|
|
|
7
9
|
WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata. It is originally intended to be useful for researchers who look for an easy way to obtain open-source multi-lingual corpora, but may be handy for other purposes.
|
data/lib/wp2txt/version.rb
CHANGED
data/wp2txt.gemspec
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: wp2txt
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.5.
|
|
4
|
+
version: 0.5.2
|
|
5
5
|
prerelease:
|
|
6
6
|
platform: ruby
|
|
7
7
|
authors:
|
|
@@ -9,7 +9,7 @@ authors:
|
|
|
9
9
|
autorequire:
|
|
10
10
|
bindir: bin
|
|
11
11
|
cert_chain: []
|
|
12
|
-
date: 2013-01-
|
|
12
|
+
date: 2013-01-24 00:00:00.000000000 Z
|
|
13
13
|
dependencies:
|
|
14
14
|
- !ruby/object:Gem::Dependency
|
|
15
15
|
name: rspec
|
|
@@ -107,6 +107,22 @@ dependencies:
|
|
|
107
107
|
- - ! '>='
|
|
108
108
|
- !ruby/object:Gem::Version
|
|
109
109
|
version: '0'
|
|
110
|
+
- !ruby/object:Gem::Dependency
|
|
111
|
+
name: bundler
|
|
112
|
+
requirement: !ruby/object:Gem::Requirement
|
|
113
|
+
none: false
|
|
114
|
+
requirements:
|
|
115
|
+
- - ! '>='
|
|
116
|
+
- !ruby/object:Gem::Version
|
|
117
|
+
version: '0'
|
|
118
|
+
type: :runtime
|
|
119
|
+
prerelease: false
|
|
120
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
121
|
+
none: false
|
|
122
|
+
requirements:
|
|
123
|
+
- - ! '>='
|
|
124
|
+
- !ruby/object:Gem::Version
|
|
125
|
+
version: '0'
|
|
110
126
|
description: WP2TXT extracts plain text data from Wikipedia dump file (encoded in
|
|
111
127
|
XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
|
|
112
128
|
email:
|