wp2txt 0.9.1 → 0.9.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 7354f2494c849046bdf54d6c5b798e25faeebdda
4
- data.tar.gz: 9287f7cf4a18b4525a4e6a3ba2faf749b375f1eb
2
+ SHA256:
3
+ metadata.gz: 50f291332872b0e3cd0b651662d7494ec9edd823fdb6ba6a928f501a37ea06c3
4
+ data.tar.gz: ec4891f6a30c7bc2f8f0a6fd3ec56618c9f706ea277207e7f955347417959f7e
5
5
  SHA512:
6
- metadata.gz: e623c78f32aed94821a74353999462ce3cc32595a29afe2d439a2abce461a809f13f145039d0c30db5774290b06958963a1c70f5c4bbae3f7980416a11571bde
7
- data.tar.gz: e85430ddae0dfca581533b7487ef365844594bcea43735f6f5f61589f689ed941dddb5472b303772966ca5236d42557e57a318023a15075fe3214f32e328d941
6
+ metadata.gz: afa3770c47bc25252993bfddf6da6e99a7bca87d4d899b3f8ce44d8a6298d29a19ce06fe9b64166316a31672a76d7d4530887e77d98212bc8f17a350c0e1598a
7
+ data.tar.gz: ef6f5b11b8a7d2ae5eeb640b0f2319bea9ee1209b0ab1dd78833f3cde41149fb8468871d90ba75b00c62876fccfa5c5f7cca6fc2420d4769c3c35a7bd9aa8786
data/README.md CHANGED
@@ -1,32 +1,51 @@
1
1
  # WP2TXT
2
2
 
3
- Wikipedia dump file to text converter
3
+ Wikipedia dump file to text converter that extracts both content and category data
4
4
 
5
- **IMPORTANT:** This is a project still work in progress and it could be slow, unstable, and even destructive! It should be used with caution.
6
-
7
- ### About ###
5
+ ## About
8
6
 
9
7
  WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata. It is originally intended to be useful for researchers who look for an easy way to obtain open-source multi-lingual corpora, but may be handy for other purposes.
10
8
 
11
- **UPDATE:** Version 0.9.1 has added a new option `num-threads`, which improves the performance significantly . Note also that `--category` option is enabled by default, resulting with output format somewhat different from previous versions. Check out the new format using test data in `data/output_samples` folder before going on to convert a huge wikipedia dump.
9
+ **UPDATE (July 2022)**: Version 0.9.3 has added a new option `category_only`. With this option enabled, wp2txt extracts article title and category info only. Please see output examples below.
12
10
 
13
- ### Features ###
11
+ ## Features
14
12
 
15
- * Convert dump files of Wikipedia of various languages (I hope).
13
+ * Convert dump files of Wikipedia of various languages
16
14
  * Create output files of specified size.
17
- * Allow users to specify text elements to be extracted/converted (page titles, section titles, lists, and tables).
15
+ * Allow users to specify text elements to be extracted/converted (page titles, section titles, lists, and tables)
16
+ * Extract category information of each article
17
+
18
+ ## Installation
18
19
 
19
- ### Installation
20
-
21
20
  $ gem install wp2txt
22
21
 
23
- ### Usage
22
+ ## Usage
24
23
 
25
24
  Obtain a Wikipedia dump file (from [here](http://dumps.wikimedia.org/backup-index.html)) with a file name such as:
26
25
 
27
26
  xxwiki-yyyymmdd-pages-articles.xml.bz2
28
27
 
29
- where `xx` is language code such as "en (English)" or "ja (Japanese)", and `yyyymmdd` is the date of creation (e.g. 20120601).
28
+ where `xx` is language code such as "en (English)" or "ja (Japanese)", and `yyyymmdd` is the date of creation (e.g. 20220720).
29
+
30
+ ### Example 1
31
+
32
+ The following extracts text data, including list items and excluding tables.
33
+
34
+ $ wp2txt -i xxwiki-yyyymmdd-pages-articles.xml.bz2 -o /output_dir
35
+
36
+ - [Output example (English)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en.txt)
37
+ - [Output example (Japanese)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja.txt)
38
+
39
+ ### Example 2
40
+
41
+ The following will extract only article titles and the categories to which each article belongs:
42
+
43
+ $ wp2txt --category-only -i xxwiki-yyyymmdd-pages-articles.xml.bz2 -o /output_dir
44
+
45
+ - [Output example (English)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_en_categories.txt)
46
+ - [Output example (Japanese)](https://raw.githubusercontent.com/yohasebe/wp2txt/master/data/output_samples/testdata_ja_categories.txt)
47
+
48
+ ## Options
30
49
 
31
50
  Command line options are as follows:
32
51
 
@@ -40,44 +59,45 @@ Command line options are as follows:
40
59
  --list, --no-list, -l: Show list items in output (default: true)
41
60
  --heading, --no-heading, -d: Show section titles in output (default: true)
42
61
  --title, --no-title, -t: Show page titles in output (default: true)
43
- --table, -a: Show table source code in output
44
- --inline, -n: leave inline template notations unmodified
45
- --multiline, -m: leave multiline template notations unmodified
46
- --ref, -r: leave reference notations in the format
62
+ --table, -a: Show table source code in output (default: false)
63
+ --inline, -n: leave inline template notations unmodified (default: false)
64
+ --multiline, -m: leave multiline template notations unmodified (default: false)
65
+ --ref, -r: leave reference notations in the format (default: false)
47
66
  [ref]...[/ref]
48
- --redirect, -e: Show redirect destination
67
+ --redirect, -e: Show redirect destination (default: false)
49
68
  --marker, --no-marker, -k: Show symbols prefixed to list items,
50
69
  definitions, etc. (Default: true)
51
- --category, -g: Show article category information
70
+ --category, -g: Show article category information (default: true)
71
+ --category-only, -y: Extract only article title and categories (default: false)
52
72
  --file-size, -f <i>: Approximate size (in MB) of each output file
53
73
  (default: 10)
54
- -u, --num-threads=<i>: Number of threads to be spawned (capped to the number of CPU cores;
74
+ -u, --num-threads=<i>: Number of threads to be spawned (capped to the number of CPU cores;
55
75
  set 99 to spawn max num of threads) (default: 4)
56
76
  --version, -v: Print version and exit
57
77
  --help, -h: Show this message
58
78
 
59
- ### Caveats ###
79
+ ## Caveats
60
80
 
61
81
  * Certain types of data such as mathematical equations and computer source code are not be properly converted. Please remember this software is originally intended for correcting “sentences” for linguistic studies.
62
- * Extraction of normal text data could sometimes fail for various reasons (e.g. illegal matching of begin/end tags, language-specific conventions of formatting, etc).
82
+ * Extraction of normal text data could sometimes fail for various reasons (e.g. illegal matching of begin/end tags, language-specific conventions of formatting, etc).
63
83
  * Conversion process can take far more than you would expect. It could take several hours or more when dealing with a huge data set such as the English Wikipedia on a low-spec environments.
64
84
  * Because of nature of the task, WP2TXT needs much machine power and consumes a lot of memory/storage resources. The process thus could halt unexpectedly. It may even get stuck, in the worst case, without getting gracefully terminated. Please understand this and use the software __at your own risk__.
65
85
 
66
- ### Useful Link ###
86
+ ### Useful Links
67
87
 
68
88
  * [Wikipedia Database backup dumps](http://dumps.wikimedia.org/backup-index.html)
69
-
70
- ### Author ###
89
+
90
+ ### Author
71
91
 
72
92
  * Yoichiro Hasebe (<yohasebe@gmail.com>)
73
93
 
74
- ### References ###
94
+ ### References
75
95
 
76
96
  The author will appreciate your mentioning one of these in your research.
77
97
 
78
98
  * Yoichiro HASEBE. 2006. [Method for using Wikipedia as Japanese corpus.](http://ci.nii.ac.jp/naid/110006226727) _Doshisha Studies in Language and Culture_ 9(2), 373-403.
79
99
  * 長谷部陽一郎. 2006. [Wikipedia日本語版をコーパスとして用いた言語研究の手法](http://ci.nii.ac.jp/naid/110006226727). 『言語文化』9(2), 373-403.
80
100
 
81
- ### License ###
101
+ ### License
82
102
 
83
103
  This software is distributed under the MIT License. Please see the LICENSE file.
data/bin/wp2txt CHANGED
@@ -11,11 +11,11 @@ DOCDIR = File.join(File.dirname(__FILE__), '..', 'doc')
11
11
  require 'wp2txt'
12
12
  require 'wp2txt/utils'
13
13
  require 'wp2txt/version'
14
- require 'trollop'
14
+ require 'optimist'
15
15
 
16
16
  include Wp2txt
17
17
 
18
- opts = Trollop::options do
18
+ opts = Optimist::options do
19
19
  version Wp2txt::VERSION
20
20
  banner <<-EOS
21
21
  WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
@@ -37,11 +37,12 @@ EOS
37
37
  opt :redirect, "Show redirect destination", :default => false
38
38
  opt :marker, "Show symbols prefixed to list items, definitions, etc.", :default => true
39
39
  opt :category, "Show article category information", :default => true
40
+ opt :category_only, "Extract only article title and categories", :default => false
40
41
  opt :file_size, "Approximate size (in MB) of each output file", :default => 10
41
42
  opt :num_threads, "Number of threads to be spawned (capped to the number of CPU cores; set 99 to spawn max num of threads)", :default => 4
42
43
  end
43
- Trollop::die :size, "must be larger than 0" unless opts[:file_size] >= 0
44
- Trollop::die :output_dir, "must exist" unless File.exist?(opts[:output_dir])
44
+ Optimist::die :size, "must be larger than 0" unless opts[:file_size] >= 0
45
+ Optimist::die :output_dir, "must exist" unless File.exist?(opts[:output_dir])
45
46
 
46
47
  input_file = ARGV[0]
47
48
  output_dir = opts[:output_dir]
@@ -63,72 +64,79 @@ wpconv = Wp2txt::Runner.new(parent, input_file, output_dir, tfile_size, num_thre
63
64
 
64
65
  wpconv.extract_text do |article|
65
66
  format_wiki!(article.title)
66
- title = "[[#{article.title}]]\n"
67
67
 
68
- if opts[:category] && !article.categories.empty?
68
+ if opts[:category_only]
69
+ title = "#{article.title}\t"
70
+ contents = article.categories.join(", ")
71
+ contents << "\n"
72
+ elsif opts[:category] && !article.categories.empty?
73
+ title = "\n[[#{article.title}]]\n\n"
69
74
  contents = "\nCATEGORIES: "
70
75
  contents << article.categories.join(", ")
71
76
  contents << "\n\n"
72
77
  else
78
+ title = "\n[[#{article.title}]]\n\n"
73
79
  contents = ""
74
80
  end
75
81
 
76
- article.elements.each do |e|
77
- case e.first
78
- when :mw_heading
79
- next if !config[:heading]
80
- format_wiki!(e.last)
81
- line = e.last
82
- line << "+HEADING+" if $DEBUG_MODE
83
- when :mw_paragraph
84
- format_wiki!(e.last)
85
- line = e.last + "\n"
86
- line << "+PARAGRAPH+" if $DEBUG_MODE
87
- when :mw_table, :mw_htable
88
- next if !config[:table]
89
- line = e.last
90
- line << "+TABLE+" if $DEBUG_MODE
91
- when :mw_pre
92
- next if !config[:pre]
93
- line = e.last
94
- line << "+PRE+" if $DEBUG_MODE
95
- when :mw_quote
96
- line = e.last
97
- line << "+QUOTE+" if $DEBUG_MODE
98
- when :mw_unordered, :mw_ordered, :mw_definition
99
- next if !config[:list]
100
- line = e.last
101
- line << "+LIST+" if $DEBUG_MODE
102
- when :mw_ml_template
103
- next if !config[:multiline]
104
- line = e.last
105
- line << "+MLTEMPLATE+" if $DEBUG_MODE
106
- when :mw_redirect
107
- next if !config[:redirect]
108
- line = e.last
109
- line << "+REDIRECT+" if $DEBUG_MODE
110
- line << "\n\n"
111
- when :mw_isolated_template
112
- next if !config[:multiline]
113
- line = e.last
114
- line << "+ISOLATED_TEMPLATE+" if $DEBUG_MODE
115
- when :mw_isolated_tag
116
- next
117
- else
118
- if $DEBUG_MODE
119
- # format_wiki!(e.last)
82
+ unless opts[:category_only]
83
+ article.elements.each do |e|
84
+ case e.first
85
+ when :mw_heading
86
+ next if !config[:heading]
87
+ format_wiki!(e.last)
120
88
  line = e.last
121
- line << "+OTHER+"
122
- else
89
+ line << "+HEADING+" if $DEBUG_MODE
90
+ when :mw_paragraph
91
+ format_wiki!(e.last)
92
+ line = e.last + "\n"
93
+ line << "+PARAGRAPH+" if $DEBUG_MODE
94
+ when :mw_table, :mw_htable
95
+ next if !config[:table]
96
+ line = e.last
97
+ line << "+TABLE+" if $DEBUG_MODE
98
+ when :mw_pre
99
+ next if !config[:pre]
100
+ line = e.last
101
+ line << "+PRE+" if $DEBUG_MODE
102
+ when :mw_quote
103
+ line = e.last
104
+ line << "+QUOTE+" if $DEBUG_MODE
105
+ when :mw_unordered, :mw_ordered, :mw_definition
106
+ next if !config[:list]
107
+ line = e.last
108
+ line << "+LIST+" if $DEBUG_MODE
109
+ when :mw_ml_template
110
+ next if !config[:multiline]
111
+ line = e.last
112
+ line << "+MLTEMPLATE+" if $DEBUG_MODE
113
+ when :mw_redirect
114
+ next if !config[:redirect]
115
+ line = e.last
116
+ line << "+REDIRECT+" if $DEBUG_MODE
117
+ line << "\n\n"
118
+ when :mw_isolated_template
119
+ next if !config[:multiline]
120
+ line = e.last
121
+ line << "+ISOLATED_TEMPLATE+" if $DEBUG_MODE
122
+ when :mw_isolated_tag
123
123
  next
124
+ else
125
+ if $DEBUG_MODE
126
+ # format_wiki!(e.last)
127
+ line = e.last
128
+ line << "+OTHER+"
129
+ else
130
+ next
131
+ end
124
132
  end
133
+ contents << line << "\n"
125
134
  end
126
- contents << line << "\n"
127
135
  end
128
136
 
129
137
  if /\A[\s ]*\z/m =~ contents
130
138
  result = ""
131
139
  else
132
- result = config[:title] ? "\n#{title}\n" << contents : contents
140
+ result = config[:title] ? title << contents : contents
133
141
  end
134
142
  end
@@ -28704,7 +28704,7 @@ File:Halkbank.jpg|Halkbank Tower (1993) designed by Doğan Tekeli and Sami Sisa
28704
28704
 
28705
28705
  * [http://www.esenbogaairport.com/ Esenboğa International Airport]
28706
28706
 
28707
- [[Anaconda]]
28707
+ [[Arabic language]]
28708
28708
 
28709
28709
  CATEGORIES: Arabic language, Central Semitic languages, Fusional languages, Languages of Algeria, Languages of Bahrain, Languages of Chad, Languages of Comoros, Languages of Djibouti, Languages of Eritrea, Languages of Gibraltar, Languages of Iraq, Languages of Israel, Languages of Jordan, Languages of Kuwait, Languages of Lebanon, Languages of Libya, Languages of Mauritania, Languages of Morocco, Languages of Oman, Languages of Qatar, Languages of Saudi Arabia, Languages of Somalia, Languages of Somaliland, Languages of Sudan, Languages of Syria, Languages of the United Arab Emirates, Languages of Tunisia, Languages of Yemen, Languages of Trinidad and Tobago, Requests for audio pronunciation (Arabic), Stress-timed languages, Subject–verb–object languages, Languages of Palestine
28710
28710
 
@@ -0,0 +1,207 @@
1
+ Anarchism Anarchism, Political culture, Political ideologies, Social theories, Anti-fascism, Anti-capitalism, Far-left politics
2
+ Autism Autism, Communication disorders, Mental and behavioural disorders, Neurological disorders, Neurological disorders in children, Pervasive developmental disorders, Psychiatric diagnosis
3
+ Albedo Climate forcing, Climatology, Electromagnetic radiation, Radiometry, Scattering, absorption and radiative transfer (optics), Radiation
4
+ A ISO basic Latin letters, Vowel letters
5
+ Alabama Alabama, States and territories established in 1819, States of the Confederate States of America, States of the United States, Southern United States
6
+ Achilles Characters in the Iliad, Demigods of Classical mythology, Kings of the Myrmidons, Greek mythological hero cult, People of the Trojan War, Thessalians in the Trojan War
7
+ Abraham Lincoln Abraham Lincoln, 1809 births, 1865 deaths, American people of English descent, Assassinated heads of state, Assassinated United States Presidents, Burials at Oak Ridge Cemetery, Deaths by firearm in Washington, D.C., Illinois lawyers, Illinois Republicans, Illinois Whigs, Lincoln family, Members of the Illinois House of Representatives, Members of the United States House of Representatives from Illinois, People from Coles County, Illinois, People from LaRue County, Kentucky, People from Macon County, Illinois, People from Spencer County, Indiana, People from Springfield, Illinois, People murdered in Washington, D.C., People of Illinois in the American Civil War, Political party founders, American postmasters, Presidents of the United States, Republican Party (United States) presidential nominees, Republican Party Presidents of the United States, Smallpox survivors, Union political leaders, United States presidential candidates, 1860, United States presidential candidates, 1864, Whig Party members of the United States House of Representatives, Hall of Fame for Great Americans inductees
8
+ Aristotle Aristotle, 384 BC births, 322 BC deaths, 4th-century BC philosophers, 4th-century BC writers, Academic philosophers, Acting theorists, Ancient Greek mathematicians, Ancient Greek philosophers, Ancient Greek physicists, Ancient Greeks in Macedon, Ancient Stagirites, Attic Greek writers, Cosmologists, Pro-slavery activists, Empiricists, Ancient Greek biologists, Ancient Greek logicians, History of logic, Humor researchers, Metaphysicians, Meteorologists, Metic philosophers in Classical Athens, Natural philosophers, Peripatetic philosophers, Philosophers and tutors of Alexander the Great, Philosophers of ancient Chalcidice, Philosophers of language, Philosophers of law, Philosophers of mind, Political philosophers, Rhetoric theorists, Trope theorists, Ancient literary critics, Virtue ethicists, Zoologists
9
+ An American in Paris Compositions by George Gershwin, Symphonic poems, Grammy Hall of Fame Award recipients, 1928 compositions, Music about Paris
10
+ Academy Award for Best Production Design Academy Awards, Best Art Direction Academy Award winners, Awards for best art direction
11
+ Academy Awards Academy Awards, American film awards, Awards established in 1929, 1929 establishments in the United States
12
+ Actrius 1996 films, 1990s drama films, Spanish films, Catalan-language films, Films set in Barcelona, Barcelona in fiction, Films directed by Ventura Pons
13
+ Animalia (book) Children's picture books, Alphabet books, 1986 books, Picture books by Graeme Base, Puzzle books
14
+ International Atomic Time Time scales
15
+ Altruism Altruism, Defence mechanisms, Evolution by phenotype, Morality, Philanthropy, Social philosophy, Social psychology, Auguste Comte, Virtue
16
+ Ayn Rand 1905 births, 1982 deaths, 20th-century American novelists, 20th-century philosophers, 20th-century women writers, American anti-communists, American anti-fascists, American atheists, American classical liberals, American essayists, American ethicists, American people of Russian-Jewish descent, American political theorists, American science fiction writers, American screenwriters, American women novelists, American women philosophers, American women screenwriters, American writers of Russian descent, Atheist philosophers, Ayn Rand, Burials at Kensico Cemetery, Cardiovascular disease deaths in New York, Critics of feminism, Critics of religions, Deaths from heart failure, Epistemologists, Imperial Russian atheists, Imperial Russian emigrants to the United States, Imperial Russian Jews, Jewish American dramatists and playwrights, Jewish American novelists, Jewish atheists, Jewish philosophers, Jewish women writers, Metaphysicians, Objectivists, Philosophers of mind, Political philosophers, Prometheus Award winners, Pseudonymous writers, Russian anti-communists, Russian anti-fascists, Russian dramatists and playwrights, Russian essayists, Russian novelists, Russian philosophers, Russian science fiction writers, Russian screenwriters, Russian women writers, Saint Petersburg State University alumni, Soviet emigrants to the United States, Women essayists, Women science fiction and fantasy writers, Writers from New York City, Writers from Saint Petersburg, American women dramatists and playwrights, 20th-century American dramatists and playwrights
17
+ Alain Connes 1947 births, Living people, 20th-century mathematicians, Members of the United States National Academy of Sciences, 21st-century mathematicians, Collège de France faculty, French mathematicians, Fields Medalists, Mathematical analysts, Differential geometers, École Normale Supérieure alumni, Vanderbilt University faculty, Foreign Members of the Russian Academy of Sciences, Members of the French Academy of Sciences, Members of the Norwegian Academy of Science and Letters, Members of the Royal Danish Academy of Sciences and Letters, Clay Research Award recipients
18
+ Allan Dwan 1885 births, 1981 deaths, American film directors, American film producers, American screenwriters, Western (genre) film directors, Canadian film directors, Canadian emigrants to the United States, Writers from Toronto, Short film directors, Disease-related deaths in California, Burials at San Fernando Mission Cemetery
19
+ Algeria Algeria, Countries in Africa, North African countries, List of Mediterranean countries, Maghrebi countries, Arabic-speaking countries and territories, Berber-speaking countries and territories, French-speaking countries and territories, G15 nations, Member states of OPEC, Member states of the African Union, Member states of the Arab League, Member states of the Organisation of Islamic Cooperation, Member states of the Union for the Mediterranean, Member states of the United Nations, Republics, Requests for audio pronunciation (Arabic), Requests for audio pronunciation (Berber), States and territories established in 1962, World Digital Library related
20
+ List of Atlas Shrugged characters Atlas Shrugged characters, Fictional socialites, Lists of literary characters
21
+ Anthropology Anthropology, Social sciences
22
+ Agricultural science Agronomy
23
+ Alchemy Alchemy, Hermeticism, Esotericism
24
+ Astronomer Astronomy, Astronomers, Science occupations
25
+ ASCII ASCII, Character sets, Latin-alphabet representations, Presentation layer protocols
26
+ Animation Animation
27
+ Apollo Apollo, Arts gods, Deities in the Iliad, Dragonslayers, Health gods, Knowledge gods, LGBT history in Greece, LGBT themes in mythology, Muses, Temples of Apollo, Mythological Greek archers, Mythological rapists, Oracular gods, Roman gods, Solar gods
28
+ Andre Agassi 1970 births, American male tennis players, American people of Armenian descent, American people of Iranian-Assyrian descent, American philanthropists, Australian Open (tennis) champions, Doping cases in tennis, French Open champions, Grand Slam (tennis) champions in men's singles, International Tennis Hall of Fame inductees, Living people, Nevada Democrats, Assyrian sportspeople, Olympic gold medalists for the United States, Olympic medalists in tennis, Olympic tennis players of the United States, Sportspeople from the Las Vegas Valley, Tennis people from Nevada, Tennis players at the 1996 Summer Olympics, US Open (tennis) champions, Wimbledon champions, World No. 1 tennis players, Iranian Assyrian people, American autobiographers, Writers from Nevada, Medalists at the 1996 Summer Olympics, American sportspeople in doping cases, 20th-century American businesspeople, 21st-century American businesspeople, American investors, American real estate businesspeople
29
+ Austroasiatic languages Agglutinative languages, Austroasiatic languages
30
+ Afroasiatic languages Afro-Asiatic, Afro-Asiatic languages, Language families
31
+ Andorra Andorra, Constitutional monarchies, Countries in Europe, Diarchies, Landlocked countries, Liberal democracies, Member states of La Francophonie, Member states of the United Nations, Països Catalans, Prince-Bishoprics, Principalities, Pyrenees, Romance countries and territories, States and territories established in 1278
32
+ Arithmetic mean Means
33
+ American Football Conference National Football League, American Football League, Organizations established in 1970
34
+ Animal Farm 1945 novels, Allegory, Animal Farm, Dystopian novels, English novels, Hugo Award for Best Novella winning works, Prometheus Award winning works, Literature featuring anthropomorphic characters, British novellas, Novels by George Orwell, Political literature, Roman à clef novels, Satirical novels, Novels about revolutionaries, Novels about totalitarianism, Novels about propaganda, British satirical novels, Animals in media, Secker & Warburg books
35
+ Amphibian Amphibians, Animal classes
36
+ Alaska Alaska, Arctic Ocean, Exclaves in the United States, Former Russian colonies, States and territories established in 1959, States of the United States, West Coast of the United States
37
+ Agriculture Agriculture
38
+ Aldous Huxley Aldous Huxley, 1894 births, 1963 deaths, 19th-century English people, People educated at Eton College, Alumni of Balliol College, Oxford, Burials in Surrey, Cancer deaths in California, Consciousness researchers and theorists, Deaths from laryngeal cancer, Duke University faculty, English agnostics, English essayists, English expatriates in the United States, English humanists, English novelists, English pacifists, English people of Cornish descent, English poets, English philosophers, English satirists, English science fiction writers, English short story writers, English travel writers, Huxley family, James Tait Black Memorial Prize recipients, Mystics, People associated with the Human Potential Movement, People from Godalming, People from Taos, New Mexico, Psychedelic drug advocates, Writers from Los Angeles, California, 20th-century British novelists
39
+ Algae Algae, Endosymbiotic events
40
+ Analysis of variance Analysis of variance, Design of experiments, Statistical tests, Parametric statistics
41
+ Alkane Alkanes, Hydrocarbons, Organic compounds, Functional groups
42
+ Appellate procedure in the United States United States appellate procedure, United States procedural law, Legal procedure
43
+ Answer Common law, Legal documents
44
+ Appellate court Courts by type, Appellate courts
45
+ Arraignment Legal terms, Prosecution, United States criminal procedure, Criminal law of the United Kingdom, Australian criminal law
46
+ America the Beautiful 1895 songs, American patriotic songs, Pikes Peak
47
+ Assistive technology Assistive technology, Disability, Educational technology, Web accessibility
48
+ Abacus Abacus, Chinese mathematics, Egyptian mathematics, Greek mathematics, Indian mathematics, Japanese mathematics, Mathematical tools, Roman mathematics
49
+ Acid Acids, Acid–base chemistry
50
+ Asphalt Building materials, Road construction, Chemical mixtures, Amorphous solids, Asphalt, Petroleum, Petroleum products, IARC Group 2B carcinogens, Latin words and phrases, Pavements
51
+ American National Standards Institute Organizations established in 1918, 501(c)(3) nonprofit organizations, Standards organizations, ISO member bodies, ANSI standards, 1918 establishments in the United States, Technical specifications
52
+ Apollo 11 Apollo program, Manned missions to the Moon, Missions to the Moon, Sample return missions, 1969 in the United States, Spacecraft launched in 1969, Articles containing video clips, Individual spacecraft in the collection of the Smithsonian Institution, Soft landings on the Moon, Spacecraft which reentered in 1969
53
+ Apollo 8 Spacecraft launched in 1968, 1968 in the United States, Apollo program, Manned missions to the Moon, Spacecraft which reentered in 1968
54
+ Astronaut Astronauts, Science occupations
55
+ A Modest Proposal Essays by Jonathan Swift, Satirical works, Pamphlets, 1729 essays, Works published anonymously, British satire, 1729 in Great Britain, Cannibalism in fiction
56
+ Alkali metal Alkali metals, Groups in the periodic table, Periodic table
57
+ Alphabet Alphabets, Documents, Orthography
58
+ Atomic number Chemical properties, Nuclear physics, Atoms
59
+ Anatomy Anatomy
60
+ Affirming the consequent Propositional fallacies
61
+ Andrei Tarkovsky 1932 births, 1986 deaths, People from Yuryevets District, Burials at Sainte-Geneviève-des-Bois Russian Cemetery, Russian people of Polish descent, Ukrainian people of Polish descent, Russian people of Ukrainian descent, Andrei Tarkovsky, BAFTA winners (people), Russian male actors, Russian Orthodox Christians, Soviet film directors, Russian film directors, Russian opera directors, Gerasimov Institute of Cinematography alumni, High Courses for Scriptwriters and Film Directors faculty, Deaths from lung cancer, Cancer deaths in France, Lenin Prize winners
62
+ Ambiguity Semantics, Critical thinking, Ambiguity
63
+ Aardvark Mammals of Africa, Myrmecophagous mammals, Living fossils, Megafauna of Africa, Animals described in 1766
64
+ Aardwolf Animals described in 1783, Carnivorans of Africa, Hyenas, Mammals of Africa, Fauna of Southern Africa, Fauna of East Africa, Myrmecophagous mammals
65
+ Adobe Building materials, Masonry, Adobe buildings and structures, Appropriate technology, Vernacular architecture, Requests for audio pronunciation (Spanish), Sustainable building, Buildings and structures by construction material
66
+ Adventure Adventure
67
+ Asia Asia, Continents
68
+ Aruba Aruba, Countries in the Caribbean, Dependent territories in North America, Dutch-speaking countries and territories, Former Dutch colonies, Island countries, Islands of the Netherlands Antilles, Kingdom of the Netherlands, Lesser Antilles, Netherlands Antilles articles correct after Dissolution, Special territories of the European Union, States and territories established in 1986
69
+ Articles of Confederation 1781 in law, Defunct constitutions, Documents of the American Revolution, Ordinances of the Continental Congress, Federalism in the United States, History of the United States (1776–89), History of York County, Pennsylvania, Legal history of the United States, Pennsylvania in the American Revolution, Political charters, United States historical documents, York, Pennsylvania
70
+ Atlantic Ocean Atlantic Ocean, Oceans, History of the Atlantic Ocean, Landforms of the Atlantic Ocean
71
+ Arthur Schopenhauer Arthur Schopenhauer, 1788 births, 1860 deaths, 19th-century German writers, 19th-century philosophers, Antinatalists, Atheist philosophers, German atheists, German monarchists, German people of Dutch descent, German philosophers, Humboldt University of Berlin faculty, Idealists, Kantian philosophers, Metaphysicians, Monism, People from Gdańsk, University of Göttingen alumni, Philosophers of art, Burials at Frankfurt Main Cemetery
72
+ Angola Angola, Bantu countries and territories, Central African countries, Countries in Africa, Former Portuguese colonies, Least developed countries, Member states of OPEC, Member states of the African Union, Member states of the Community of Portuguese Language Countries, Member states of the United Nations, Portuguese-speaking countries and territories, Republics, States and territories established in 1975, World Digital Library related
73
+ Demographics of Angola Angolan society, Demographics by country
74
+ Politics of Angola Politics of Angola
75
+ Economy of Angola African Union member economies, Economy of Angola, OPEC, World Trade Organization member economies, Blood diamonds
76
+ Transport in Angola Transport in Angola
77
+ Angolan Armed Forces Military of Angola, Military history of Angola
78
+ Foreign relations of Angola Foreign relations of Angola
79
+ Albert Sidney Johnston 1803 births, 1862 deaths, Confederate States military personnel killed in the American Civil War, Burials at Texas State Cemetery, Confederate States Army generals, People from Washington, Kentucky, People from Texas, People of California in the American Civil War, People of Texas in the American Civil War, American people of the Black Hawk War, Transylvania University alumni, United States Army generals, United States Military Academy alumni, People of the Texas Revolution, People of the Utah War
80
+ Android (robot) Androids, Science fiction themes, Android (robot), Osaka University research
81
+ Alberta Alberta, 1905 establishments in Canada, Provinces and territories of Canada, States and territories established in 1905, Canadian Prairies
82
+ List of anthropologists Lists of social scientists, Anthropologists
83
+ Actinopterygii Ray-finned fish, Animal classes
84
+ Albert Einstein 1879 births, 1955 deaths, 20th-century American writers, 20th-century German writers, Albert Einstein, American agnostics, American engineers, American pacifists, American philosophers, American physicists, American science writers, American socialists, American Zionists, Charles University in Prague faculty, Cosmologists, Deaths from abdominal aortic aneurysm, Einstein family, ETH Zurich alumni, ETH Zurich faculty, Foreign Members of the Royal Society, German emigrants to Switzerland, German Jews who emigrated to the United States to escape Nazism, German Nobel laureates, German physicists, German socialists, Institute for Advanced Study faculty, Jewish agnostics, Jewish American scientists, Jewish engineers, Jewish philosophers, Jewish physicists, Leiden University faculty, Members of the American Philosophical Society, Nobel laureates in Physics, Pantheists, Patent examiners, People from Berlin, People from Bern, People from Munich, People from Princeton, New Jersey, People from Ulm, People from Zürich, People with acquired Austria-Hungary citizenship, People with acquired Swiss citizenship, Philosophers of science, Relativists, Stateless people, Swiss agnostics, Swiss emigrants to the United States, Swiss physicists, Swiss Jews, Theoretical physicists, Winners of the Max Planck Medal
85
+ Afghanistan Afghanistan, Central Asian countries, Education in Afghanistan, Iranian Plateau, Islamic republics, Islamic states, Landlocked countries, Least developed countries, Member states of the Organisation of Islamic Cooperation, Member states of the South Asian Association for Regional Cooperation, Member states of the United Nations, Pashto-speaking countries and territories, Persian-speaking countries and territories, Republics, South Asian countries, States and territories established in 1709, States and territories established in 1747, Territories under military occupation
86
+ Albania Albania, Albanian-speaking countries and territories, Balkans, Countries in Europe, List of Mediterranean countries, Member states of La Francophonie, Member states of NATO, Member states of the Organisation of Islamic Cooperation, Member states of the Union for the Mediterranean, Member states of the United Nations, Republics, States and territories established in 1912, World Digital Library related
87
+ Allah Allah, Islamic theology, God, Creator deities
88
+ Algorithms (journal) Computer science journals, Open access journals, Multidisciplinary Digital Publishing Institute academic journals, Quarterly journals, English-language journals, Publications established in 2008, Mathematics journals
89
+ Azerbaijan Azerbaijan, Caucasus, Western Asian countries, Countries in Europe, Near Eastern countries, Landlocked countries, Modern Turkic states, Republics, Requests for audio pronunciation (Azerbaijani), Russian-speaking countries and territories, States and territories established in 1991, Western Asia, Member states of the Organisation of Islamic Cooperation, Member states of the Commonwealth of Independent States, Member states of the United Nations, Azerbaijani-speaking countries and territories, Ethnic Azerbaijani people, Eastern Europe
90
+ Amateur astronomy Amateur astronomers, Amateur astronomy
91
+ Aikido Japanese martial arts, Aikido, Dō
92
+ Art Aesthetics, Arts, Visual arts
93
+ Agnostida Agnostida, Trilobites, Cambrian trilobites, Ordovician trilobites, Fossil taxa described in 1864
94
+ Abortion Abortion, Gender studies, Fertility, Human reproduction, Ethically disputed practices
95
+ Abstract (law) Legal research
96
+ American Revolutionary War American Revolutionary War, Global conflicts, Resistance to the British Empire, Wars of independence
97
+ Ampere SI base units, Units of electric current
98
+ Algorithm Algorithms, Articles with example pseudocode, Mathematical logic, Theoretical computer science
99
+ Annual plant Plants, Garden plants, Horticulture and gardening, Annual plants
100
+ Anthophyta Plants
101
+ Mouthwash Dentifrices, Oral hygiene, Drug delivery devices, Dosage forms
102
+ Alexander the Great Alexander the Great, 356 BC births, 323 BC deaths, 4th-century BC Greek people, 4th-century BC Macedonians, 4th-century BC rulers, Ancient Macedonian generals, Ancient Pellaeans, City founders, Deified people, Greek historical hero cult, Hellenistic individuals, Hellenistic ruler cult, Macedonian monarchs, Monarchs of Persia, Pharaohs of the Argead dynasty
103
+ Alfred Korzybski 1879 births, 1950 deaths, People from Warsaw, People from Warsaw Governorate, Clan of Abdank, Polish emigrants to the United States, Polish engineers, Polish philosophers, Polish mathematicians, Linguists from Poland, Contemporary philosophers, General semantics
104
+ Asteroids (video game) 1979 video games, Ed Logg games, Amiga games, Arcade games, Atari 2600 games, Cancelled Atari 5200 games, Atari 7800 games, IOS games, Atari 8-bit family games, Atari arcade games, Atari Lynx games, Game Boy Color games, Mac OS games, Mobile games, Windows games, PlayStation games, Nintendo 64 games, Xbox 360 Live Arcade games, Xbox 360 games, Multidirectional shooters, Vector arcade games, Multiplayer video games, Android games, Video games of the Museum of Modern Art, Asteroids in fiction
105
+ Asparagales Asparagales, Angiosperm orders
106
+ Alismatales Alismatales, Angiosperm orders
107
+ Apiales Apiales, Angiosperm orders
108
+ Asterales Asterales, Angiosperm orders
109
+ Asteroid Asteroids, Asteroid groups and families, Binary asteroids, Spaceflight
110
+ Allocution Criminal procedure, Evidence law
111
+ Affidavit Evidence law, Legal documents, Notary
112
+ Aries (constellation) Aries (constellation), Constellations, Constellations listed by Ptolemy, Northern constellations, Western astrology
113
+ Aquarius (constellation) Aquarius (constellation), Constellations, Western astrology, Equatorial constellations, Constellations listed by Ptolemy
114
+ Anime 1917 introductions, Anime, Anime and manga terminology, Articles including recorded pronunciations, Japanese inventions
115
+ Ankara Ankara, Capitals in Asia, Capitals in Europe, Populated places in Ankara Province
116
+ Arabic language Arabic language, Central Semitic languages, Fusional languages, Languages of Algeria, Languages of Bahrain, Languages of Chad, Languages of Comoros, Languages of Djibouti, Languages of Eritrea, Languages of Gibraltar, Languages of Iraq, Languages of Israel, Languages of Jordan, Languages of Kuwait, Languages of Lebanon, Languages of Libya, Languages of Mauritania, Languages of Morocco, Languages of Oman, Languages of Qatar, Languages of Saudi Arabia, Languages of Somalia, Languages of Somaliland, Languages of Sudan, Languages of Syria, Languages of the United Arab Emirates, Languages of Tunisia, Languages of Yemen, Languages of Trinidad and Tobago, Requests for audio pronunciation (Arabic), Stress-timed languages, Subject–verb–object languages, Languages of Palestine
117
+ Alfred Hitchcock 1899 births, 1980 deaths, 19th-century English people, Alfred Hitchcock, BAFTA fellows, Deaths from renal failure, English film directors, English film producers, English emigrants to the United States, English-language film directors, English people of Irish descent, English Roman Catholics, English television directors, American film directors, American film producers, American television directors, German-language film directors, Horror film directors, Knights Commander of the Order of the British Empire, People educated at St Ignatius' College, Enfield, Silent film directors, People from Leytonstone, Edgar Award winners, Cecil B. DeMille Award Golden Globe winners
118
+ Anaconda Snakes, Boinae by common name
119
+ Altaic languages Altaic languages, Agglutinative languages, Altai Mountains, Northeast Asia, Requests for audio pronunciation
120
+ Austrian German German dialects, Languages of Austria, National varieties of German, Upper German languages
121
+ Axiom of choice Axiom of choice
122
+ Attila Attila the Hun, 406 births, 453 deaths, Huns, 5th-century monarchs in Europe, Turkic rulers, Ancient Germanic rulers, Deaths from choking, History of Hungary
123
+ Aegean Sea Aegean Sea, Bodies of water of Greece, Bodies of water of Turkey, Seas of the Mediterranean, European seas
124
+ A Clockwork Orange A Clockwork Orange, 1962 novels, Mind control in fiction, Books written in fictional dialects, British novellas, British novels adapted into films, British philosophical novels, British science fiction novels, Dystopian novels, Fiction with unreliable narrators, Ludwig van Beethoven in popular culture, Metafictional works, Novels about music, Novels by Anthony Burgess, Novels set in England, Obscenity controversies, Prometheus Award winning works, Rape in fiction, Heinemann (publisher) books, English-language novels
125
+ Amsterdam Amsterdam, Capitals in Europe, Cities in the Netherlands, Olympic cycling venues, Populated places established in the 13th century, Populated places in North Holland, Port cities and towns in the Netherlands, Port cities and towns of the North Sea, 1928 Summer Olympic venues
126
+ Museum of Work Museums in Östergötland County, Norrköping, Industry museums in Sweden
127
+ Audi Audi, Baden-Württemberg, Companies based in Bavaria, Car manufacturers of Germany, Companies disestablished in 1939, Companies established in 1909, Companies established in 1965, German brands, Ingolstadt, Luxury motor vehicle manufacturers, Motor vehicle manufacturers of Germany, Saxony, Sports car manufacturers, Volkswagen Group
128
+ Aircraft Aircraft
129
+ Alfred Nobel 1833 births, 1896 deaths, 19th-century Swedish people, Deaths from stroke, Swedish Christians, Members of the Royal Swedish Academy of Sciences, Nobel family, Nobel Prize, People from Stockholm, Swedish businesspeople, Swedish chemists, Swedish engineers, Swedish inventors, Swedish Lutherans, Burials in Sweden
130
+ Alexander Graham Bell 1847 births, 1922 deaths, 19th-century Scottish people, Alexander Graham Bell, Alumni of the University of Edinburgh, Alumni of University College London, American agnostics, American businesspeople, American educationists, American eugenicists, American inventors, American people of Scottish descent, American physicists, American Unitarians, Aviation pioneers, Canadian agnostics, Canadian Aviation Hall of Fame inductees, Canadian emigrants to the United States, Canadian eugenicists, Canadian inventors, Canadian people of Scottish descent, Canadian physicists, Canadian Unitarians, Elliott Cresson Medal recipients, Fellows of the American Academy of Arts and Sciences, History of telecommunications, IEEE Edison Medal recipients, Language teachers, Members of the American Philosophical Society, Members of the United States National Academy of Sciences, National Aviation Hall of Fame inductees, National Geographic Society, Officiers of the Légion d'honneur, People educated at the Royal High School, Edinburgh, People from Baddeck, Nova Scotia, People from Boston, Massachusetts, People from Brantford, People from Edinburgh, People from Washington, D.C., Scottish agnostics, Scottish businesspeople, Scottish emigrants to Canada, Scottish eugenicists, Scottish inventors, Scottish Science Hall of Fame, Scottish Unitarians, Smithsonian Institution people, Hall of Fame for Great Americans inductees, George Washington University trustee
131
+ Anatolia Anatolia, Ancient Greek geography, Geography of Turkey, Peninsulas of Asia, Physiographic provinces, Western Asia, World Digital Library related
132
+ Apple Inc. 1976 establishments in California, Apple Inc., Companies based in Cupertino, California, Companies established in 1976, Computer companies of the United States, Computer hardware companies, Display technology companies, Electronics companies of the United States, Home computer hardware companies, Mobile phone manufacturers, Multinational companies headquartered in the United States, Networking hardware companies, Portable audio player manufacturers, Publicly traded companies of the United States, Electronics companies, Retail companies of the United States, Software companies based in the San Francisco Bay Area, Steve Jobs, Warrants issued in Hong Kong Stock Exchange, Technology companies of the United States
133
+ Aberdeenshire Council areas of Scotland, Aberdeenshire
134
+ Aztlan Underground Native American rappers, American rappers of Mexican descent, Musical groups from Los Angeles, California, Rapcore groups, West Coast hip hop musicians
135
+ American Civil War American Civil War, Rebellions in the United States, Wars involving the United States, 1860s in the United States, Wars of independence
136
+ Andy Warhol Andy Warhol, 1928 births, 1987 deaths, 20th-century American writers, 20th-century artists, 20th-century painters, Album-cover and concert-poster artists, American cinematographers, American contemporary artists, American Eastern Catholics, American experimental filmmakers, American film directors, American film directors of Hungarian descent, American film producers, American painters, American portrait painters, American people of Hungarian descent, American people of Lemko descent, American people of Slovak descent, American people of Rusyn descent, American pop artists, American printmakers, American screenwriters, American shooting survivors, American socialites, Artists from New York, Artists from Pittsburgh, Pennsylvania, Burials in Pennsylvania, Carnegie Mellon University alumni, Censorship in the arts, Counterculture of the 1960s, Deaths from myocardial infarction, Deaths from surgical complications, Fashion illustrators, Film directors from Pennsylvania, Gay artists, Gay writers, LGBT artists from the United States, LGBT Roman Catholics, LGBT directors, LGBT people from New York, LGBT people from Pennsylvania, LGBT producers, LGBT writers from the United States, Photographers from New York, Pop artists, Postmodern artists, Ruthenian Catholics, Schenley High School alumni, The Velvet Underground, Warhola family, Writers from New York, Writers from Pittsburgh, Pennsylvania
137
+ Alp Arslan 1029 births, 1072 deaths, Seljuk rulers, Monarchs of Persia, Byzantine–Seljuq Wars, Murdered monarchs
138
+ American Film Institute 1967 establishments in the United States, American Film Institute, Organizations established in 1967
139
+ Akira Kurosawa 1910 births, 1998 deaths, People from Tokyo, Akira Kurosawa, Fellows of the American Academy of Arts and Sciences, Recipients of the Order of Culture, Recipients of the Praemium Imperiale, Recipients of the Order of Friendship of Peoples, Silver Bear for Best Director recipients, Academy Honorary Award recipients, Légion d'honneur recipients, Fukuoka Asian Culture Prize winners, Ramon Magsaysay Award winners, David di Donatello winners, Best Director BAFTA Award winners, César Award winners, People's Honour Award winners, Cardiovascular disease deaths in Japan, Deaths from stroke, Japanese film producers, Japanese film directors, Japanese screenwriters, Samurai film directors, Yakuza film directors, Asian film producers
140
+ Ancient Egypt Ancient Egypt, African civilizations, Former empires of Africa, States of Ancient Africa
141
+ Analog Brothers American hip hop groups, Ice-T
142
+ Motor neuron disease Motor neurone disease, Systemic atrophies primarily affecting the central nervous system, Rare diseases
143
+ Abjad Abjad writing systems, Arabic orthography
144
+ Abugida Abugida writing systems, Requests for audio pronunciation (English)
145
+ ABBA ABBA, Atlantic Records artists, Eurodisco groups, Eurovision Song Contest entrants of 1974, Eurovision Song Contest winners, Melodifestivalen contestants, Melodifestivalen winners, Musical groups disestablished in 1982, Musical groups established in 1972, Musical groups from Stockholm, Musical quartets, RCA Records artists, Rock and Roll Hall of Fame inductees, Swedish dance music groups, Swedish Eurovision Song Contest entrants, Swedish musical groups, Swedish pop music groups, Schlager groups, 1972 establishments in Sweden, Europop groups, Swedish-language singers
146
+ Allegiance Allegiance, Nationalism, Authoritarianism
147
+ Altenberg Place name disambiguation pages
148
+ MessagePad Apple Newton, Products introduced in 1993, Apple personal digital assistants
149
+ A. E. van Vogt 1912 births, 2000 deaths, 20th-century American novelists, American male novelists, American science fiction writers, American short story writers, Canadian male novelists, Canadian science fiction writers, Canadian short story writers, Canadian expatriate writers in the United States, Deaths from Alzheimer's disease, Prometheus Award winners, SFWA Grand Masters, Science Fiction Hall of Fame inductees, Writers from Manitoba
150
+ Anna Kournikova 1981 births, Living people, Australian Open (tennis) champions, Olympic tennis players of Russia, Sportspeople from Miami-Dade County, Florida, Russian emigrants to the United States, Russian female models, Russian female tennis players, Russian socialites, Sportspeople from Moscow, Tennis players at the 1996 Summer Olympics, Grand Slam (tennis) champions in women's doubles, Participants in American reality television series
151
+ Alfons Maria Jakob People from Aschaffenburg, German neurologists, German neuroscientists, German Roman Catholics, 1884 births, 1931 deaths
152
+ Agnosticism Agnosticism, Epistemological theories, Secularism, Philosophical movements, Philosophy of religion, Skepticism
153
+ Argon Argon, Chemical elements, Industrial gases, Noble gases
154
+ Arsenic Metalloids, Pnictogens, Toxicology, Chemical elements, Endocrine disruptors, Arsenic, IARC Group 1 carcinogens, Biology and pharmacology of chemical elements, Trigonal minerals, Teratogens, Fetotoxicants, Suspected testicular toxicants
155
+ Antimony Antimonide minerals, Antimony compounds, Chemical elements, Metalloids, Pnictogens, Antimony, Nuclear materials, Trigonal minerals
156
+ Actinium Chemical elements, Actinides, Actinium
157
+ Americium Actinides, Americium, Carcinogens, Chemical elements, Synthetic elements
158
+ Astatine Astatine, Chemical elements, Halogens, Metalloids, Articles containing Greek-language text
159
+ Atom Atoms, Concepts in physics, Chemistry
160
+ Arable land Agricultural land
161
+ Aluminium Aluminium minerals, Aluminium, Rocket fuels, Electrical conductors, Pyrotechnic fuels, Airship technology, Chemical elements, Post-transition metals, Reducing agents
162
+ Advanced Chemistry German hip hop groups
163
+ Anglican Communion 1867 establishments in England, Religious organizations established in 1867, Anglicanism, Chalcedonianism, International bodies of denominations
164
+ Arne Kaijser 1950 births, Living people, Swedish scholars and academics, Royal Institute of Technology academics, Members of the Royal Swedish Academy of Engineering Sciences, Historians of science, Historians of technology, Linköping University alumni
165
+ Archipelago Archipelagoes, Coastal and oceanic landforms, Islands
166
+ Author Writing occupations, Literary criticism
167
+ Andrey Markov 1856 births, 1922 deaths, 19th-century Russian mathematicians, 20th-century mathematicians, Russian atheists, Russian mathematicians, Former Eastern Orthodox Christians, Probability theorists, Saint Petersburg State University alumni, Full Members of the St Petersburg Academy of Sciences, Full Members of the Russian Academy of Sciences (1917–25), People from Ryazan, Russian statisticians
168
+ Angst Anxiety, Emotions, Existentialist concepts
169
+ Anxiety Anxiety, Emotions, Mental states in Csikszentmihalyi's flow model
170
+ A. A. Milne 1882 births, 1956 deaths, People from Hampstead, People from Kilburn, London, 20th-century British children's literature, 20th-century British novelists, Alumni of Trinity College, Cambridge, British Army personnel of World War I, Royal Warwickshire Fusiliers officers, English children's writers, English novelists, Members of the Detection Club, People educated at Westminster School, London, English poets, Winnie-the-Pooh, Deaths from stroke, English people of Scottish descent, Writers from London
171
+ Asociación Alumni Rugby clubs established in 1951, Argentine rugby union teams
172
+ Axiom Mathematical axioms, Mathematical terminology, Formal systems, Concepts in logic
173
+ Alpha Greek letters, Vowel letters
174
+ Alvin Toffler American male writers, American non-fiction writers, American science fiction writers, American technology writers, American futurologists, Transhumanists, Jewish American writers, People from Ridgefield, Connecticut, Radical centrist writers, Writers from Connecticut, 1928 births, Living people
175
+ The Amazing Spider-Man 1963 comic debuts, Comics by J. Michael Straczynski, Comics by John Byrne, Comics by Mark Waid, Comics by Stan Lee, Comics by Steve Ditko, Spider-Man titles
176
+ AM Two-letter disambiguation pages
177
+ Antigua and Barbuda Antigua and Barbuda, Countries in the Caribbean, Constitutional monarchies, English-speaking countries and territories, Former British colonies, Island countries, Liberal democracies, Member states of the Caribbean Community, Member states of the Commonwealth of Nations, Member states of the United Nations, States and territories established in 1981
178
+ Azincourt Communes of Pas-de-Calais
179
+ Albert Speer Albert Speer, 1905 births, 1981 deaths, People from the Grand Duchy of Baden, People from Mannheim, 20th-century German architects, German people convicted of crimes against humanity, German people of World War II, German prisoners and detainees, German writers, Nazi architects, Historians of Nazism, Nazi Germany ministers, Nazi leaders, Neoclassical architects, Officials of Nazi Germany, People convicted by the International Military Tribunal in Nuremberg, SS officers, Technical University of Berlin alumni, Technische Universität München alumni, Karlsruhe Institute of Technology alumni, Members of the Reichstag of Nazi Germany, Recipients of the Golden Party Badge, Recipients of the Knights Cross of the War Merit Cross, Recipients of the Honour Chevron for the Old Guard
180
+ Allioideae Allioideae, Plant subfamilies
181
+ Asteraceae Asteraceae, Asterales families, Flowers
182
+ Apiaceae Asterid families, Apiaceae
183
+ Axon Neurons, Neurophysiology, Neuroanatomy
184
+ Aramaic alphabet Aramaic alphabet, Scripts encoded in Unicode 5.2
185
+ American shot Cinematography
186
+ Acute disseminated encephalomyelitis Multiple sclerosis, Autoimmune diseases, Neurological disorders, Enterovirus-associated diseases, Measles
187
+ Ataxia Symptoms and signs: Nervous system, Cerebral palsy types, Stroke
188
+ Abdul Alhazred Characters in short stories, Fictional alchemists, Fictional Arab people, Fictional characters introduced in 1924, Fictional writers, Mortals in the Cthulhu Mythos
189
+ Ada Lovelace 1815 births, 1852 deaths, 19th-century English mathematicians, 19th-century women writers, Ada programming language, British computer scientists, British countesses, Burials in Nottinghamshire, Byron family, Cancer deaths in England, Computer designers, Daughters of barons, Deaths from uterine cancer, English computer programmers, English computer scientists, English people of Scottish descent, English scientists, English women poets, Lord Byron, Programming language designers, Women computer scientists, Women in engineering, Women in technology, Women mathematicians, Women of the Victorian era
190
+ August Derleth 1909 births, 1971 deaths, University of Wisconsin–Madison alumni, American Christians, American short story writers, American mystery writers, 20th-century American novelists, Cthulhu Mythos writers, Guggenheim Fellows, American horror writers, People from Sauk County, Wisconsin, Writers from Wisconsin, Deaths from myocardial infarction, Science fiction editors, Solar Pons, Anthologists
191
+ Alps Alps, Mountain ranges of Europe, Mountains of Europe, Geography of Central Europe, Geography of Western Europe, Physiographic provinces
192
+ Albert Camus 1913 births, 1960 deaths, 20th-century French novelists, 20th-century French philosophers, Albert Camus, Anarchist communists, Anarcho-pacifists, Anarcho-syndicalists, Anti-fascists, Atheist philosophers, Existentialists, French agnostics, French anarchists, French anti–death penalty activists, French atheists, French Communist Party members, French dramatists and playwrights, French essayists, French expatriates in Algeria, French humanists, French journalists, French Nobel laureates, French pacifists, French people of Spanish descent, French Resistance members, Légion d'honneur refusals, Modernist writers, Nobel laureates in Literature, People from Dréan, Pieds-Noirs, Road accident deaths in France, University of Algiers alumni
193
+ Agatha Christie Agatha Christie, 1890 births, 1976 deaths, 20th-century English writers, Dames Commander of the Order of the British Empire, Edgar Award winners, English crime fiction writers, English dramatists and playwrights, English mystery writers, English short story writers, British detective writers, Members of the Detection Club, English women dramatists and playwrights, Female wartime nurses, People from Cholsey, People from Sunningdale, People from Torquay, British women in World War I, Fellows of the Royal Society of Literature, Anthony Award winners, Women mystery writers, Booker authors' division, Burials in Oxfordshire, Writers of historical mysteries, World War I nurses, British women short story writers, British dramatists and playwrights, 20th-century British novelists, 20th-century women writers, English women novelists
194
+ The Plague 1947 novels, Novels by Albert Camus, Absurdist fiction, Novels set in Algeria, Plague (disease), Éditions Gallimard books
195
+ Applied ethics Applied ethics, Ethics
196
+ Absolute value Special functions
197
+ Analog signal Analog circuits, Electronic design, Television terminology, Video signal
198
+ Arecales Palms, Angiosperm orders
199
+ Hercule Poirot Hercule Poirot, Characters in British novels of the 20th century, Fictional Belgian people, Fictional characters introduced in 1920, Fictional criminologists, Fictional police officers, Fictional private investigators, Hercule Poirot characters, Series of books
200
+ Miss Marple Miss Marple, Novel series, Fictional amateur detectives, Fictional English people, Fictional characters introduced in 1926, Characters in British novels of the 20th century
201
+ April Months, April
202
+ August Months, August
203
+ Aaron Ancient Egyptian Jews, Torah people, Christian saints from the Old Testament, Moses, Kohanim, Aaron, Book of Exodus, Biblical figures in Islam
204
+ April 6 Days of the year, April
205
+ April 12 Days of the year, April
206
+ April 15 Days of the year, April
207
+
@@ -0,0 +1,48 @@
1
+ アンパサンド 記号
2
+ エスペラント エスペラント, 人工言語
3
+ 言語 言語, 言語学, 民族
4
+ 日本語 日本語, Japanese language, 国語
5
+ 地理学 地理学
6
+ 欧州連合 欧州連合
7
+ 国の一覧 一覧, 国
8
+ 漫画 漫画, 娯楽, Comics
9
+ 日本 日本, 島国, 君主国
10
+ フランス France, フランス, G8加盟国
11
+ パリ フランスの都市, ローマ都市, パリ, イル=ド=フランス
12
+ ヨーロッパ ヨーロッパ
13
+ ジミー・カーター アメリカ合衆国の大統領, ノーベル平和賞受賞者, ジョージア州知事, ジョージア州の人物, 1924年生
14
+ 生物 生物, 地球, 地球史
15
+ センタイ類 植物学, コケ植物
16
+ 社会学 社会学
17
+ 古代エジプト 古代エジプト, 考古学
18
+ エジプト エジプト, イスラム教国
19
+ 著作権の保護期間 著作権法
20
+ 東京 Tokyo, 東京都, 東京23区の地域, 関東地方, 日本の都市
21
+ 台東区 特別区, 台東区
22
+ 地理 地理, 教科
23
+ 生物学 Biology, 生物学, 自然科学, 理学
24
+ 社会 社会
25
+ こどもの文化 子供の遊び, 子供, 育児
26
+ 特撮 特撮, SF, テレビドラマ
27
+ 日常生活 生活, 文化, 人の行動
28
+ 情報工学 情報工学, 情報学, 計算科学
29
+ 形式言語 言語学, 形式言語, 構文解析 (プログラミング)
30
+ 文脈自由言語 形式言語, 構文解析 (プログラミング)
31
+ 正規言語 形式言語
32
+ 自然言語処理 言語学, 自然言語処理
33
+ 自然言語 言語の分類, 言語学
34
+ プログラミング言語 プログラミング言語, コンピュータ言語
35
+ 人工知能 情報工学, 人工知能, 心の哲学, ユーザインターフェイス (コンピュータ), SF
36
+ オーストリア オーストリア, 内陸国
37
+ GNU Free Documentation License ライセンス, 知的財産権, フリーソフトウェア財団
38
+ 社会学者の一覧 社会学者, 学者の人名一覧
39
+ オランダ オランダ, 君主国
40
+ ゴーダチーズ チーズ, オランダの食文化
41
+ バールーフ・デ・スピノザ オランダ史の人物, オランダの哲学者, ユダヤ教改革派, 破門, 17世紀の学者
42
+ 文脈自由文法 形式言語
43
+ フランス語 フランス語, フランスの言語, カナダの言語, スイスの言語, ベルギーの言語, レバノンの言語, モロッコの言語, コンゴ共和国の言語, コンゴ民主共和国の言語, チュニジアの言語, カメルーンの言語, マリ共和国の言語, セネガルの言語, トーゴの言語, ルワンダの言語, ブルンジの言語, ベナンの言語, コートジボワールの言語, インド・ヨーロッパ語族
44
+ イタリア語 イタリア語, イタリアの言語, インド・ヨーロッパ語族
45
+ スペイン語 スペイン語, スペインの言語, アルゼンチンの言語, メキシコの言語, ボリビアの言語, チリの言語, コロンビアの言語, パラグアイの言語, ウルグアイの言語, イタリック語派
46
+ 宗教学 宗教学, 人文科学, 宗教
47
+ 音楽 Music, 音楽
48
+
@@ -1,3 +1,3 @@
1
1
  module Wp2txt
2
- VERSION = "0.9.1"
2
+ VERSION = "0.9.4"
3
3
  end
data/lib/wp2txt.rb CHANGED
@@ -6,6 +6,7 @@ $: << File.join(File.dirname(__FILE__))
6
6
  require "nokogiri"
7
7
  require "parallel"
8
8
 
9
+ require 'etc'
9
10
  require 'pp'
10
11
  require "wp2txt/article"
11
12
  require "wp2txt/utils"
@@ -100,7 +101,7 @@ module Wp2txt
100
101
  if /.bz2$/ =~ @input_file
101
102
  unless NO_BZ2
102
103
  file = Bzip2::Reader.new File.open(@input_file, "r:UTF-8")
103
- @parent.msg("WP2TXT is spawming #{@num_threads} threads to process data \n", 0)
104
+ @parent.msg("WP2TXT is spawning #{@num_threads} threads to process data \n", 0)
104
105
  @parent.msg("Preparing ... This may take several minutes or more ", 0)
105
106
  @infile_size = file_size(file)
106
107
  @parent.msg("... Done.", 1)
@@ -112,7 +113,7 @@ module Wp2txt
112
113
  else
113
114
  file = IO.popen("bzip2 -c -d #{@input_file}")
114
115
  end
115
- @parent.msg("WP2TXT is spawming #{@num_threads} threads to process data \n", 0)
116
+ @parent.msg("WP2TXT is spawning #{@num_threads} threads to process data \n", 0)
116
117
  @parent.msg("Preparing ... This may take several minutes or more ", 0)
117
118
  @infile_size = file_size(file)
118
119
  @parent.msg("... Done.", 1)
data/wp2txt.gemspec CHANGED
@@ -25,5 +25,5 @@ Gem::Specification.new do |s|
25
25
  s.add_dependency "nokogiri"
26
26
  s.add_dependency "parallel"
27
27
  s.add_dependency "htmlentities"
28
- s.add_dependency "trollop"
28
+ s.add_dependency "optimist"
29
29
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wp2txt
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.1
4
+ version: 0.9.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yoichiro Hasebe
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-01-10 00:00:00.000000000 Z
11
+ date: 2022-07-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -53,7 +53,7 @@ dependencies:
53
53
  - !ruby/object:Gem::Version
54
54
  version: '0'
55
55
  - !ruby/object:Gem::Dependency
56
- name: trollop
56
+ name: optimist
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - ">="
@@ -84,7 +84,9 @@ files:
84
84
  - bin/benchmark.rb
85
85
  - bin/wp2txt
86
86
  - data/output_samples/testdata_en.txt
87
+ - data/output_samples/testdata_en_categories.txt
87
88
  - data/output_samples/testdata_ja.txt
89
+ - data/output_samples/testdata_ja_categories.txt
88
90
  - data/testdata_en.bz2
89
91
  - data/testdata_ja.bz2
90
92
  - lib/wp2txt.rb
@@ -99,7 +101,7 @@ files:
99
101
  homepage: http://github.com/yohasebe/wp2txt
100
102
  licenses: []
101
103
  metadata: {}
102
- post_install_message:
104
+ post_install_message:
103
105
  rdoc_options: []
104
106
  require_paths:
105
107
  - lib
@@ -114,9 +116,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
114
116
  - !ruby/object:Gem::Version
115
117
  version: '0'
116
118
  requirements: []
117
- rubyforge_project: wp2txt
118
- rubygems_version: 2.6.13
119
- signing_key:
119
+ rubygems_version: 3.3.7
120
+ signing_key:
120
121
  specification_version: 4
121
122
  summary: Wikipedia dump to text converter
122
123
  test_files: