wikiscript 0.3.1 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: aa972f9f2c4580091fb3dac93cf76f94f6054ba1
4
- data.tar.gz: 138954a7a0281032b68aae0f29b9d239c9f7f90a
2
+ SHA256:
3
+ metadata.gz: 9c6ea507fa295074d82e5d81eb88828c4473b35e5aee7f04b79cfbc774bb9e72
4
+ data.tar.gz: 56a5e69536495a90e0c1635799c8023ca57fdfbc554f818d40f86046972f13fb
5
5
  SHA512:
6
- metadata.gz: 74af5f6bd6b915a1887d9022763294f0e6e634415d1bbdf3b18ae9fc0f9a11b01812b904a42a4ed687a60cafbd50230afd8a804dab8bb6a8b28475f99437be2b
7
- data.tar.gz: 44f92623eb560825a834de570b8cfc604a92bfb907b38c2390b2d3a52cc74bcf8f229d21369c27bd73c0b99a9db9c65f5df3bf9b6d304e9876694135e7517ddc
6
+ metadata.gz: 5d0bc2f5ec90d5a3c3f6c0c72c145f33a4faf1acccb28a3487059c715a8c60b5a5ed43fdd54318fe30669b1f64601a48eab0bfbba1f30da494f0b89a526cf054
7
+ data.tar.gz: ad591ed36382240029ddf9cf3e120fb938d3a253b8766cf04e0875b222b2349ee7473a1b0c851ee337afcd501d8830cea1da39d30eb78e60f9aa539d33408a59
data/CHANGELOG.md CHANGED
@@ -1,3 +1,5 @@
1
+ ### 0.4.0
2
+
1
3
  ### 0.0.1 / 2014-07-03
2
4
 
3
5
  * Everything is new. First release.
data/Manifest.txt CHANGED
@@ -1,18 +1,11 @@
1
1
  CHANGELOG.md
2
- LICENSE.md
3
2
  Manifest.txt
4
- NOTES.md
5
3
  README.md
6
4
  Rakefile
7
5
  lib/wikiscript.rb
8
6
  lib/wikiscript/client.rb
7
+ lib/wikiscript/outline_reader.rb
9
8
  lib/wikiscript/page.rb
10
9
  lib/wikiscript/page_reader.rb
11
10
  lib/wikiscript/table_reader.rb
12
11
  lib/wikiscript/version.rb
13
- test/helper.rb
14
- test/test_link.rb
15
- test/test_page.rb
16
- test/test_page_de.rb
17
- test/test_page_reader.rb
18
- test/test_table_reader.rb
data/README.md CHANGED
@@ -12,16 +12,203 @@ Read-only access to wikikpedia pages.
12
12
  Example - Get wikitext source (via `en.wikipedia.org/w/index.php?action=raw&title=<title>`):
13
13
 
14
14
 
15
+ ``` ruby
16
+ page = Wikiscript::Page.get( '2022_FIFA_World_Cup' ) # same as Wikiscript.get
17
+ page.text
15
18
  ```
16
- >> page = Wikiscript::Page.new( '2014_FIFA_World_Cup_squads' )
17
- >> page.text
18
19
 
19
- The [[2014 FIFA World Cup]] is an international [[association football|football]]
20
- tournament which is currently being held in Brazil from 12 June to 13 July 2014.
21
- The 32 national teams involved in the tournament were required to register
22
- a squad of 23 players, including three goalkeepers...
20
+ prints
21
+
22
+ ```
23
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
24
+ the quadrennial international men's [[association football]] championship contested by the
25
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
26
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
27
+ in the [[Arab world]] and the first in a Muslim-majority country...
28
+ ```
29
+
30
+ Or build your own page from scratch (no download):
31
+
32
+ ``` ruby
33
+ page = Wikiscript::Page.new( <<TXT, title: '2022_FIFA_World_Cup' )
34
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
35
+ the quadrennial international men's [[association football]] championship contested by the
36
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
37
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
38
+ in the [[Arab world]] and the first in a Muslim-majority country...
39
+ TXT
40
+ page.text
41
+ ```
42
+
43
+ prints
44
+
45
+ ```
46
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
47
+ the quadrennial international men's [[association football]] championship contested by the
48
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
49
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
50
+ in the [[Arab world]] and the first in a Muslim-majority country...
51
+ ```
52
+
53
+
54
+ ### Tables
55
+
56
+ Parse wiki tables into an array. Example:
57
+
58
+ ``` ruby
59
+ table = Wikiscript.parse_table( <<TXT )
60
+ {|
61
+ |-
62
+ ! header1
63
+ ! header2
64
+ ! header3
65
+ |-
66
+ | row1cell1
67
+ | row1cell2
68
+ | row1cell3
69
+ |-
70
+ | row2cell1
71
+ | row2cell2
72
+ | row2cell3
73
+ |}
74
+ TXT
75
+
76
+ # -or-
77
+
78
+ table = Wikiscript.parse_table( <<TXT )
79
+ {|
80
+ ! header1 !! header2 !! header3
81
+ |-
82
+ | row1cell1 || row1cell2 || row1cell3
83
+ |-
84
+ | row2cell1 || row2cell2 || row2cell3
85
+ |}
86
+ TXT
87
+
88
+ # -or-
89
+
90
+ table = Wikiscript.parse_table( <<TXT )
91
+ {|
92
+ |-
93
+ !
94
+ header1
95
+ !
96
+ header2
97
+ !
98
+ header3
99
+ |-
100
+ |
101
+ row1cell1
102
+ |
103
+ row1cell2
104
+ |
105
+ row1cell3
106
+ |-
107
+ |
108
+ row2cell1
109
+ |
110
+ row2cell2
111
+ |
112
+ row2cell3
113
+ |}
114
+ TXT
115
+ ```
116
+
117
+ resulting in:
118
+
119
+ ``` ruby
120
+ pp table
121
+ #=> [["header1", "header2", "header3"],
122
+ # ["row1cell1", "row1cell2", "row1cell3"],
123
+ # ["row2cell1", "row2cell2", "row2cell3"]]
23
124
  ```
24
125
 
126
+ Note: `parse_table` will strip/remove (leading) style attributes (e.g. `àttribute="value" |` and (inline) bold and italic emphases (e.g. `''`) from the (cell) text. Example:
127
+
128
+ ``` ruby
129
+ table = Wikiscript.parse_table( <<TXT )
130
+ {|
131
+ |-
132
+ ! style="width:200px;"|Club
133
+ ! style="width:150px;"|City
134
+ |-
135
+ |[[Biu Chun Rangers]]||[[Sham Shui Po]]
136
+ |-
137
+ |bgcolor=#ffff44 |''[[Eastern Sports Club|Eastern]]''||[[Mong Kok]]
138
+ |-
139
+ |[[HKFC Soccer Section]]||[[Happy Valley, Hong Kong|Happy Valley]]
140
+ |}
141
+ TXT
142
+ ```
143
+
144
+ resulting in:
145
+
146
+ ``` ruby
147
+ pp table
148
+ #=> [["Club", "City"],
149
+ # ["[[Biu Chun Rangers]]", "[[Sham Shui Po]]"],
150
+ # ["[[Eastern Sports Club|Eastern]]", "[[Mong Kok]]"],
151
+ # ["[[HKFC Soccer Section]]", "[[Happy Valley, Hong Kong|Happy Valley]]"]]
152
+ ```
153
+
154
+ ### Links
155
+
156
+ Split links into two parts. Note: The alternate link title is optional. Example:
157
+
158
+ ``` ruby
159
+ link, title = Wikiscript.parse_link( '[[La Florida, Chile|La Florida]]' )
160
+ link #=> "La Florida, Chile"
161
+ title #=> "La Florida"
162
+
163
+ link, title = Wikiscript.parse_link( '[[ La Florida, Chile]]' )
164
+ link #=> "La Florida, Chile"
165
+ title #=> nil
166
+
167
+ link, title = Wikiscript.parse_link( 'La Florida' )
168
+ link #=> nil
169
+ title #=> nil
170
+ ```
171
+
172
+ ### Document Element Structure
173
+
174
+ Get the document's element structure.
175
+ Note: For now only section headings (`h1`, `h2`, `h3`, ...) and tables are supported.
176
+ Example:
177
+
178
+ ``` ruby
179
+ nodes = Wikiscript.parse( <<TXT )
180
+ =Heading 1==
181
+ ==Heading 2==
182
+ ===Heading 3===
183
+
184
+ {|
185
+ |-
186
+ ! header1
187
+ ! header2
188
+ ! header3
189
+ |-
190
+ | row1cell1
191
+ | row1cell2
192
+ | row1cell3
193
+ |-
194
+ | row2cell1
195
+ | row2cell2
196
+ | row2cell3
197
+ |}
198
+ TXT
199
+
200
+ pp nodes
201
+ #=> [[:h1, "Heading 1"],
202
+ # [:h2, "Heading 2"],
203
+ # [:h3, "Heading 3"],
204
+ # [:table, [["header1", "header2", "header3"],
205
+ # ["row1cell1", "row1cell2", "row1cell3"],
206
+ # ["row2cell1", "row2cell2", "row2cell3"]]]
207
+ ```
208
+
209
+
210
+ That's all for now. More functionality will get added over time.
211
+
25
212
 
26
213
 
27
214
  ## Install
data/Rakefile CHANGED
@@ -5,26 +5,26 @@ Hoe.spec 'wikiscript' do
5
5
 
6
6
  self.version = Wikiscript::VERSION
7
7
 
8
- self.summary = 'wikiscript - scripts for wikipedia (get wikitext for page etc.)'
8
+ self.summary = "wikiscript - scripts for wikipedia (get wikitext for page, parse tables 'n' links, etc.)"
9
9
  self.description = summary
10
10
 
11
- self.urls = ['https://github.com/wikiscript/wikiscript']
11
+ self.urls = { home: 'https://github.com/wikiscript/wikiscript' }
12
12
 
13
13
  self.author = 'Gerald Bauer'
14
- self.email = 'opensport@googlegroups.com'
14
+ self.email = 'gerald.bauer@gmail.com'
15
15
 
16
16
  # switch extension to .markdown for gihub formatting
17
17
  self.readme_file = 'README.md'
18
18
  self.history_file = 'CHANGELOG.md'
19
19
 
20
20
  self.extra_deps = [
21
+ ['cocos'],
21
22
  ['logutils' ],
22
- ['fetcher']
23
23
  ]
24
24
 
25
25
  self.licenses = ['Public Domain']
26
26
 
27
27
  self.spec_extras = {
28
- required_ruby_version: '>= 2.2.2'
28
+ required_ruby_version: '>= 3.1.0'
29
29
  }
30
30
  end
@@ -1,19 +1,13 @@
1
- # encoding: utf-8
2
1
 
3
2
  module Wikiscript
4
3
 
5
4
  class Client
5
+ include Logging
6
6
 
7
- include LogUtils::Logging
8
-
9
- SITE_BASE = 'http://{lang}.wikipedia.org/w/index.php'
7
+ SITE_BASE = 'https://{lang}.wikipedia.org/w/index.php'
10
8
 
11
9
  ### API_BASE = 'http://en.wikipedia.org/w/api.php'
12
10
 
13
- def initialize( opts={} )
14
- @opts = opts
15
- @worker = Fetcher::Worker.new
16
- end
17
11
 
18
12
  ## change to: wikitext or raw why? why not? or to raw? why? why not?
19
13
  def text( title, lang: Wikiscript.lang )
@@ -24,7 +18,7 @@ module Wikiscript
24
18
  # becomes
25
19
  # http://en.wikipedia.org/w/index.php or
26
20
  # http://de.wikipedia.org/w/index.php etc
27
- base_url = SITE_BASE.gsub( "{lang}", lang )
21
+ base_url = SITE_BASE.sub( "{lang}", lang )
28
22
  params = { action: 'raw',
29
23
  title: title }
30
24
 
@@ -33,6 +27,10 @@ module Wikiscript
33
27
 
34
28
  private
35
29
  def build_query( h )
30
+
31
+ ## todo/fix - check what to use for params encode
32
+ ## e.g. escape_component or such?
33
+ ## fix add params upstream to weblclient - why? why not?
36
34
  h.map do |k,v|
37
35
  "#{CGI.escape(k.to_s)}=#{CGI.escape(v.to_s)}"
38
36
  end.join( '&' )
@@ -48,27 +46,12 @@ private
48
46
  ## fix: pass in uri (add to fetcher check for is_a? URI etc.)
49
47
  uri_string = "#{base_url}?#{query}"
50
48
 
51
- response = @worker.get_response( uri_string )
52
-
53
- if response.code == '200'
54
- t = response.body
55
- ###
56
- # NB: Net::HTTP will NOT set encoding UTF-8 etc.
57
- # will mostly be ASCII
58
- # - try to change encoding to UTF-8 ourselves
59
- logger.debug "t.encoding.name (before): #{t.encoding.name}"
60
- #####
61
- # NB: ASCII-8BIT == BINARY == Encoding Unknown; Raw Bytes Here
49
+ response = Webclient.get( uri_string )
62
50
 
63
- ## NB:
64
- # for now "hardcoded" to utf8 - what else can we do?
65
- # - note: force_encoding will NOT change the chars only change the assumed encoding w/o translation
66
- t = t.force_encoding( Encoding::UTF_8 )
67
- logger.debug "t.encoding.name (after): #{t.encoding.name}"
68
- ## pp t
69
- t
51
+ if response.status.ok?
52
+ response.text
70
53
  else
71
- logger.error "fetch HTTP - #{response.code} #{response.message}"
54
+ logger.error "HTTP ERROR - #{response.status.code} #{response.status.message}"
72
55
  exit 1 ### exit for now on error - why? why not?
73
56
  ## nil
74
57
  end
@@ -0,0 +1,97 @@
1
+
2
+ module Wikiscript
3
+
4
+ class OutlineReader
5
+
6
+ def self.read( path )
7
+ txt = File.open( path, 'r:utf-8' ) { |f| f.read }
8
+ parse( txt )
9
+ end
10
+
11
+ def self.parse( txt )
12
+ new( txt ).parse
13
+ end
14
+
15
+ def initialize( txt )
16
+ @txt = txt
17
+ end
18
+
19
+
20
+ HEADING_RE = %r{\A
21
+ (?<marker>={1,}) ## 1. leading ======
22
+ [ ]*
23
+ (?<text>[^=]+?) ## 2. text (note: for now no "inline" = allowed)
24
+ [ ]*
25
+ =* ## 3. (optional) trailing ====
26
+ \z}x
27
+
28
+ def parse
29
+ outline = [] ## outline / page structure
30
+
31
+ start_para = true ## start new para(graph) on new text line?
32
+
33
+ @txt.each_line do |line|
34
+
35
+ ##
36
+ ## (auto-)sanitize first
37
+ ## - &nbsp; => "vanilla" space
38
+ ## - 1–2 => 1-2 - "vanilla" dash
39
+ ## todo - move up into txt!!!
40
+ line = line.gsub( '&nbsp;', ' ' )
41
+ line = line.gsub( /[–]/, '-' )
42
+
43
+
44
+ line = line.strip ## todo/fix: keep leading and trailing spaces - why? why not?
45
+
46
+ if line.empty? ## todo/fix: keep blank line nodes?? and just remove comments and process headings?! - why? why not?
47
+ start_para = true
48
+ next
49
+ end
50
+
51
+ break if line == '__END__'
52
+
53
+ next if line.start_with?( '#' ) ## skip comments too
54
+ ## strip inline (until end-of-line) comments too
55
+ ## e.g Eupen | KAS Eupen ## [de]
56
+ ## => Eupen | KAS Eupen
57
+ ## e.g bq Bonaire, BOE # CONCACAF
58
+ ## => bq Bonaire, BOE
59
+ line = line.sub( /#.*/, '' ).strip
60
+
61
+ ## note: like in wikimedia markup (and markdown) all optional trailing ==== too
62
+ if m=HEADING_RE.match( line )
63
+ start_para = true
64
+
65
+ heading_marker = m[:marker]
66
+ heading_level = heading_marker.length ## count number of = for heading level
67
+ heading = m[:text].strip
68
+
69
+ outline << [:"h#{heading_level}", heading]
70
+ elsif line == '----' ## make more generic/flexible - why? why not?
71
+ start_para = true
72
+ outline << [:hr]
73
+ ## The horizontal rule represents a paragraph-level thematic break. Do not use in article content,
74
+ ## as rules are used only after main sections, and this is automatic.
75
+ ## HTML equivalent: <hr /> (which can be indented,
76
+ ## whereas ---- always starts at the left margin.)
77
+ else ## assume it's a (plain/regular) text line
78
+ if start_para
79
+ outline << [:p, [line]]
80
+ start_para = false
81
+ else
82
+ node = outline[-1] ## get last entry
83
+ if node[0] == :p ## assert it's a p(aragraph) node!!!
84
+ node[1] << line ## add line to p(aragraph)
85
+ else
86
+ puts "!! ERROR - invalid outline state / format - expected p(aragraph) node; got:"
87
+ pp node
88
+ exit 1
89
+ end
90
+ end
91
+ end
92
+ end
93
+ outline
94
+ end # method parse
95
+ end # class OutlineReader
96
+
97
+ end # module Wikiscript
@@ -1,10 +1,9 @@
1
- # encoding: utf-8
2
1
 
3
2
  module Wikiscript
4
3
 
5
4
  class Page
6
5
 
7
- include LogUtils::Logging
6
+ include Logging
8
7
 
9
8
  attr_reader :title, :lang
10
9
 
@@ -16,7 +15,7 @@ module Wikiscript
16
15
  end
17
16
 
18
17
  def self.read( path )
19
- text = File.open( path, 'r:utf-8' ).read
18
+ text = File.open( path, 'r:utf-8' ) { |f| f.read }
20
19
  o = new( text, title: "File:#{path}" ) ## use auto-generated File:<path> title path - why? why not?
21
20
  o
22
21
  end
@@ -33,11 +32,23 @@ module Wikiscript
33
32
  end
34
33
 
35
34
  def text
36
- @text ||= get # cache text (from request)
35
+ @text ||= get # cache text (from request)
36
+ end
37
+
38
+ def nodes
39
+ @nodes ||= parse # cache text (from parse)
40
+ end
41
+
42
+ def each ## loop over all nodes / elements -note: nodes is a (flat) list (array) for now
43
+ nodes.each do |node|
44
+ yield( node )
45
+ end
37
46
  end
38
47
 
39
48
 
40
49
  def get ## "force" refresh text (get/fetch/download)
50
+ @nodes = nil ## note: reset cached parsed nodes too
51
+
41
52
  @text = Client.new.text( @title, lang: @lang )
42
53
  @text
43
54
  end
@@ -45,9 +56,9 @@ module Wikiscript
45
56
  alias_method :download, :get
46
57
 
47
58
 
48
-
49
59
  def parse ## todo/change: use/find a different name e.g. doc/elements/etc. - why? why not?
50
- PageReader.parse( text )
60
+ @nodes = PageReader.parse( text )
61
+ @nodes
51
62
  end
52
63
  end # class Page
53
64
  end # Wikiscript
@@ -1,11 +1,10 @@
1
- # encoding: utf-8
2
1
 
3
2
  module Wikiscript
4
3
 
5
4
  class PageReader
6
5
 
7
- def self.read( path ) ## use - rename to read_file or from_file etc. - why? why not?
8
- txt = File.open( path, 'r:utf-8' ).read
6
+ def self.read( path )
7
+ txt = File.open( path, 'r:utf-8' ) { |f| f.read }
9
8
  parse( txt )
10
9
  end
11
10
 
@@ -54,7 +53,7 @@ class PageReader
54
53
  table_txt << line << "\n"
55
54
  else
56
55
  ## note: skip unknown line types for now
57
-
56
+
58
57
  ## puts "** !!! ERROR !!! unknown line type in wiki page:"
59
58
  ## pp line
60
59
  ## exit 1
@@ -1,11 +1,10 @@
1
- # encoding: utf-8
2
1
 
3
2
  module Wikiscript
4
3
 
5
4
  class TableReader
6
5
 
7
- def self.read( path ) ## use - rename to read_file or from_file etc. - why? why not?
8
- txt = File.open( path, 'r:utf-8' ).read
6
+ def self.read( path )
7
+ txt = File.open( path, 'r:utf-8' ) { |f| f.read }
9
8
  parse( txt )
10
9
  end
11
10
 
@@ -1,12 +1,12 @@
1
1
 
2
2
  module Wikiscript
3
- VERSION = '0.3.1'
3
+ VERSION = '0.4.0'
4
4
 
5
5
  def self.banner
6
- "wikiscript/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
6
+ "wikiscript/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}] in (#{root})"
7
7
  end
8
8
 
9
9
  def self.root
10
- "#{File.expand_path( File.dirname(File.dirname(File.dirname(__FILE__))) )}"
10
+ File.expand_path( File.dirname(File.dirname(File.dirname(__FILE__))) )
11
11
  end
12
12
  end
data/lib/wikiscript.rb CHANGED
@@ -1,26 +1,25 @@
1
- # encoding: utf-8
2
-
3
- ## stdlibs
4
-
5
- require 'net/http'
6
- require 'uri'
7
- require 'cgi'
8
- require 'pp'
1
+ ## stdlibs via cocos
2
+ require 'cocos'
9
3
 
10
4
 
11
5
  ## 3rd party gems/libs
12
6
  ## require 'props'
13
7
 
14
8
  require 'logutils'
15
- require 'fetcher'
16
9
 
17
- # our own code
10
+ module Wikiscript
11
+ Logging = LogUtils::Logging
12
+ end
18
13
 
19
- require 'wikiscript/version' # let it always go first
20
- require 'wikiscript/client'
21
- require 'wikiscript/table_reader'
22
- require 'wikiscript/page_reader'
23
- require 'wikiscript/page'
14
+
15
+
16
+ # our own code
17
+ require_relative 'wikiscript/version' # let it always go first
18
+ require_relative 'wikiscript/client'
19
+ require_relative 'wikiscript/table_reader'
20
+ require_relative 'wikiscript/page_reader'
21
+ require_relative 'wikiscript/outline_reader'
22
+ require_relative 'wikiscript/page'
24
23
 
25
24
 
26
25
 
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wikiscript
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-09-21 00:00:00.000000000 Z
11
+ date: 2024-09-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: logutils
14
+ name: cocos
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - ">="
@@ -25,7 +25,7 @@ dependencies:
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
- name: fetcher
28
+ name: logutils
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
31
  - - ">="
@@ -42,64 +42,62 @@ dependencies:
42
42
  name: rdoc
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - "~>"
45
+ - - ">="
46
46
  - !ruby/object:Gem::Version
47
47
  version: '4.0'
48
+ - - "<"
49
+ - !ruby/object:Gem::Version
50
+ version: '7'
48
51
  type: :development
49
52
  prerelease: false
50
53
  version_requirements: !ruby/object:Gem::Requirement
51
54
  requirements:
52
- - - "~>"
55
+ - - ">="
53
56
  - !ruby/object:Gem::Version
54
57
  version: '4.0'
58
+ - - "<"
59
+ - !ruby/object:Gem::Version
60
+ version: '7'
55
61
  - !ruby/object:Gem::Dependency
56
62
  name: hoe
57
63
  requirement: !ruby/object:Gem::Requirement
58
64
  requirements:
59
65
  - - "~>"
60
66
  - !ruby/object:Gem::Version
61
- version: '3.16'
67
+ version: '4.1'
62
68
  type: :development
63
69
  prerelease: false
64
70
  version_requirements: !ruby/object:Gem::Requirement
65
71
  requirements:
66
72
  - - "~>"
67
73
  - !ruby/object:Gem::Version
68
- version: '3.16'
69
- description: wikiscript - scripts for wikipedia (get wikitext for page etc.)
70
- email: opensport@googlegroups.com
74
+ version: '4.1'
75
+ description: wikiscript - scripts for wikipedia (get wikitext for page, parse tables
76
+ 'n' links, etc.)
77
+ email: gerald.bauer@gmail.com
71
78
  executables: []
72
79
  extensions: []
73
80
  extra_rdoc_files:
74
81
  - CHANGELOG.md
75
- - LICENSE.md
76
82
  - Manifest.txt
77
- - NOTES.md
78
83
  - README.md
79
84
  files:
80
85
  - CHANGELOG.md
81
- - LICENSE.md
82
86
  - Manifest.txt
83
- - NOTES.md
84
87
  - README.md
85
88
  - Rakefile
86
89
  - lib/wikiscript.rb
87
90
  - lib/wikiscript/client.rb
91
+ - lib/wikiscript/outline_reader.rb
88
92
  - lib/wikiscript/page.rb
89
93
  - lib/wikiscript/page_reader.rb
90
94
  - lib/wikiscript/table_reader.rb
91
95
  - lib/wikiscript/version.rb
92
- - test/helper.rb
93
- - test/test_link.rb
94
- - test/test_page.rb
95
- - test/test_page_de.rb
96
- - test/test_page_reader.rb
97
- - test/test_table_reader.rb
98
96
  homepage: https://github.com/wikiscript/wikiscript
99
97
  licenses:
100
98
  - Public Domain
101
99
  metadata: {}
102
- post_install_message:
100
+ post_install_message:
103
101
  rdoc_options:
104
102
  - "--main"
105
103
  - README.md
@@ -109,16 +107,16 @@ required_ruby_version: !ruby/object:Gem::Requirement
109
107
  requirements:
110
108
  - - ">="
111
109
  - !ruby/object:Gem::Version
112
- version: 2.2.2
110
+ version: 3.1.0
113
111
  required_rubygems_version: !ruby/object:Gem::Requirement
114
112
  requirements:
115
113
  - - ">="
116
114
  - !ruby/object:Gem::Version
117
115
  version: '0'
118
116
  requirements: []
119
- rubyforge_project:
120
- rubygems_version: 2.5.2
121
- signing_key:
117
+ rubygems_version: 3.4.10
118
+ signing_key:
122
119
  specification_version: 4
123
- summary: wikiscript - scripts for wikipedia (get wikitext for page etc.)
120
+ summary: wikiscript - scripts for wikipedia (get wikitext for page, parse tables 'n'
121
+ links, etc.)
124
122
  test_files: []
data/LICENSE.md DELETED
@@ -1,116 +0,0 @@
1
- CC0 1.0 Universal
2
-
3
- Statement of Purpose
4
-
5
- The laws of most jurisdictions throughout the world automatically confer
6
- exclusive Copyright and Related Rights (defined below) upon the creator and
7
- subsequent owner(s) (each and all, an "owner") of an original work of
8
- authorship and/or a database (each, a "Work").
9
-
10
- Certain owners wish to permanently relinquish those rights to a Work for the
11
- purpose of contributing to a commons of creative, cultural and scientific
12
- works ("Commons") that the public can reliably and without fear of later
13
- claims of infringement build upon, modify, incorporate in other works, reuse
14
- and redistribute as freely as possible in any form whatsoever and for any
15
- purposes, including without limitation commercial purposes. These owners may
16
- contribute to the Commons to promote the ideal of a free culture and the
17
- further production of creative, cultural and scientific works, or to gain
18
- reputation or greater distribution for their Work in part through the use and
19
- efforts of others.
20
-
21
- For these and/or other purposes and motivations, and without any expectation
22
- of additional consideration or compensation, the person associating CC0 with a
23
- Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
24
- and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
25
- and publicly distribute the Work under its terms, with knowledge of his or her
26
- Copyright and Related Rights in the Work and the meaning and intended legal
27
- effect of CC0 on those rights.
28
-
29
- 1. Copyright and Related Rights. A Work made available under CC0 may be
30
- protected by copyright and related or neighboring rights ("Copyright and
31
- Related Rights"). Copyright and Related Rights include, but are not limited
32
- to, the following:
33
-
34
- i. the right to reproduce, adapt, distribute, perform, display, communicate,
35
- and translate a Work;
36
-
37
- ii. moral rights retained by the original author(s) and/or performer(s);
38
-
39
- iii. publicity and privacy rights pertaining to a person's image or likeness
40
- depicted in a Work;
41
-
42
- iv. rights protecting against unfair competition in regards to a Work,
43
- subject to the limitations in paragraph 4(a), below;
44
-
45
- v. rights protecting the extraction, dissemination, use and reuse of data in
46
- a Work;
47
-
48
- vi. database rights (such as those arising under Directive 96/9/EC of the
49
- European Parliament and of the Council of 11 March 1996 on the legal
50
- protection of databases, and under any national implementation thereof,
51
- including any amended or successor version of such directive); and
52
-
53
- vii. other similar, equivalent or corresponding rights throughout the world
54
- based on applicable law or treaty, and any national implementations thereof.
55
-
56
- 2. Waiver. To the greatest extent permitted by, but not in contravention of,
57
- applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
58
- unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
59
- and Related Rights and associated claims and causes of action, whether now
60
- known or unknown (including existing as well as future claims and causes of
61
- action), in the Work (i) in all territories worldwide, (ii) for the maximum
62
- duration provided by applicable law or treaty (including future time
63
- extensions), (iii) in any current or future medium and for any number of
64
- copies, and (iv) for any purpose whatsoever, including without limitation
65
- commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
66
- the Waiver for the benefit of each member of the public at large and to the
67
- detriment of Affirmer's heirs and successors, fully intending that such Waiver
68
- shall not be subject to revocation, rescission, cancellation, termination, or
69
- any other legal or equitable action to disrupt the quiet enjoyment of the Work
70
- by the public as contemplated by Affirmer's express Statement of Purpose.
71
-
72
- 3. Public License Fallback. Should any part of the Waiver for any reason be
73
- judged legally invalid or ineffective under applicable law, then the Waiver
74
- shall be preserved to the maximum extent permitted taking into account
75
- Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
76
- is so judged Affirmer hereby grants to each affected person a royalty-free,
77
- non transferable, non sublicensable, non exclusive, irrevocable and
78
- unconditional license to exercise Affirmer's Copyright and Related Rights in
79
- the Work (i) in all territories worldwide, (ii) for the maximum duration
80
- provided by applicable law or treaty (including future time extensions), (iii)
81
- in any current or future medium and for any number of copies, and (iv) for any
82
- purpose whatsoever, including without limitation commercial, advertising or
83
- promotional purposes (the "License"). The License shall be deemed effective as
84
- of the date CC0 was applied by Affirmer to the Work. Should any part of the
85
- License for any reason be judged legally invalid or ineffective under
86
- applicable law, such partial invalidity or ineffectiveness shall not
87
- invalidate the remainder of the License, and in such case Affirmer hereby
88
- affirms that he or she will not (i) exercise any of his or her remaining
89
- Copyright and Related Rights in the Work or (ii) assert any associated claims
90
- and causes of action with respect to the Work, in either case contrary to
91
- Affirmer's express Statement of Purpose.
92
-
93
- 4. Limitations and Disclaimers.
94
-
95
- a. No trademark or patent rights held by Affirmer are waived, abandoned,
96
- surrendered, licensed or otherwise affected by this document.
97
-
98
- b. Affirmer offers the Work as-is and makes no representations or warranties
99
- of any kind concerning the Work, express, implied, statutory or otherwise,
100
- including without limitation warranties of title, merchantability, fitness
101
- for a particular purpose, non infringement, or the absence of latent or
102
- other defects, accuracy, or the present or absence of errors, whether or not
103
- discoverable, all to the greatest extent permissible under applicable law.
104
-
105
- c. Affirmer disclaims responsibility for clearing rights of other persons
106
- that may apply to the Work or any use thereof, including without limitation
107
- any person's Copyright and Related Rights in the Work. Further, Affirmer
108
- disclaims responsibility for obtaining any necessary consents, permissions
109
- or other rights required for any use of the Work.
110
-
111
- d. Affirmer understands and acknowledges that Creative Commons is not a
112
- party to this document and has no duty or obligation with respect to this
113
- CC0 or use of the Work.
114
-
115
- For more information, please see
116
- <http://creativecommons.org/publicdomain/zero/1.0/>
data/NOTES.md DELETED
@@ -1,52 +0,0 @@
1
- # Notes
2
-
3
-
4
- ## Alternatives
5
-
6
- Wikipedia
7
-
8
- - [wikipedia-client](https://rubygems.org/gems/wikipedia-client) by Ken Pratt et al - ruby client for the Wikipedia API
9
- - <https://github.com/kenpratt/wikipedia-client>
10
- - <https://www.rubydoc.info/gems/wikipedia-client>
11
-
12
- <!-- break -->
13
-
14
- - [infoboxer](https://rubygems.org/gems/infoboxer) by Victor Shepelev et al - pure-Ruby Wikipedia (and generic MediaWiki) client and parser, targeting information extraction
15
- - <https://github.com/molybdenum-99/infoboxer>
16
- - <https://www.rubydoc.info/gems/infoboxer>
17
-
18
- <!-- break -->
19
-
20
- More
21
-
22
- - <https://github.com/molybdenum-99/reality>
23
- - https://github.com/molybdenum-99/mediawiktory
24
-
25
-
26
- Wikidata
27
-
28
- - [wikidata](https://rubygems.org/gems/wikidata) by Wil Gieseler
29
- - <https://github.com/wilg/wikidata>
30
- - <https://www.rubydoc.info/gems/wikidata>
31
-
32
- <!-- break -->
33
-
34
- - [wikidata-fetcher](https://rubygems.org/gems/wikidata-fetcher)
35
- - <https://github.com/everypolitician/wikidata-fetcher>
36
-
37
- <!-- break -->
38
-
39
- - [mediawiki_api-wikidata](https://rubygems.org/gems/mediawiki_api-wikidata)
40
- - <https://github.com/wmde/WikidataApiGem>
41
-
42
-
43
-
44
- **Python**
45
-
46
- - <https://pypi.org/project/wptools/> - Wikipedia tools (for Humans)
47
- - <https://github.com/siznax/wptools/>
48
-
49
-
50
- ## Wikipedia
51
-
52
- - Wikipedia API reference: <http://en.wikipedia.org/w/api.php>
data/test/helper.rb DELETED
@@ -1,8 +0,0 @@
1
- ## $:.unshift(File.dirname(__FILE__))
2
-
3
- ## minitest setup
4
- require 'minitest/autorun'
5
-
6
-
7
- ## our own code
8
- require 'wikiscript'
data/test/test_link.rb DELETED
@@ -1,31 +0,0 @@
1
- # encoding: utf-8
2
-
3
- ###
4
- # to run use
5
- # ruby -I ./lib -I ./test test/test_link.rb
6
-
7
-
8
- require 'helper'
9
-
10
-
11
- class TestLink < MiniTest::Test
12
-
13
- def test_unlink
14
- assert_equal 'Santiago (La Florida)', Wikiscript.unlink( '[[Santiago]] ([[La Florida, Chile|La Florida]])' )
15
- end
16
-
17
- def test_parse_link
18
- link, title = Wikiscript.parse_link( '[[La Florida, Chile|La Florida]]' )
19
- assert_equal 'La Florida, Chile', link
20
- assert_equal 'La Florida', title
21
-
22
- link, title = Wikiscript.parse_link( '[[ La Florida, Chile | La Florida ]]' )
23
- assert_equal 'La Florida, Chile', link
24
- assert_equal 'La Florida', title
25
-
26
- link, title = Wikiscript.parse_link( 'La Florida' )
27
- assert link == nil
28
- assert title == nil
29
- end
30
-
31
- end # class TestLink
data/test/test_page.rb DELETED
@@ -1,54 +0,0 @@
1
- # encoding: utf-8
2
-
3
- ###
4
- # to run use
5
- # ruby -I ./lib -I ./test test/test_page.rb
6
-
7
-
8
- require 'helper'
9
-
10
-
11
- class TestPage < MiniTest::Test
12
-
13
- def setup
14
- Wikiscript.lang = :en
15
- end
16
-
17
- def test_austria_en
18
- page = Wikiscript::Page.get( 'Austria' )
19
- # [debug] GET /w/index.php?action=raw&title=Austria uri=http://en.wikipedia.org/w/index.php?action=raw&title=Austria, redirect_limit=5
20
- # [debug] 301 TLS Redirect location=https://en.wikipedia.org/w/index.php?action=raw&title=Austria
21
- # [debug] GET /w/index.php?action=raw&title=Austria uri=https://en.wikipedia.org/w/index.php?action=raw&title=Austria, redirect_limit=4
22
- # [debug] 200 OK
23
-
24
- text = page.text
25
-
26
- ## print first 600 chars
27
- pp text[0..600]
28
-
29
- ## check for some snippets
30
- assert /{{Infobox country/ =~ text
31
- assert /common_name = Austria/ =~ text
32
- assert /capital = \[\[Vienna\]\]/ =~ text
33
- # assert /The origins of modern-day Austria date back to the time/ =~ text
34
- end
35
-
36
- def test_sankt_poelten_en
37
- page = Wikiscript::Page.get( 'Sankt_Pölten' )
38
- # [debug] GET /w/index.php?action=raw&title=Sankt_P%C3%B6lten uri=http://en.wikipedia.org/w/index.php?action=raw&title=Sankt_P%C3%B6lten, redirect_limit=5
39
- # [debug] 301 TLS Redirect location=https://en.wikipedia.org/w/index.php?action=raw&title=Sankt_P%C3%B6lten
40
- # [debug] GET /w/index.php?action=raw&title=Sankt_P%C3%B6lten uri=https://en.wikipedia.org/w/index.php?action=raw&title=Sankt_P%C3%B6lten, redirect_limit=4
41
- # [debug] 200 OK
42
-
43
- text = page.text
44
-
45
- ## print first 600 chars
46
- pp text[0..600]
47
-
48
- ## check for some snippets
49
- assert /{{Infobox settlement/ =~ text
50
- assert /name\s+=\s+Sankt Pölten/ =~ text
51
- # assert /'''Sankt Pölten''' \(''St. Pölten''\) is the capital city of/ =~ text
52
- end
53
-
54
- end # class TestPage
data/test/test_page_de.rb DELETED
@@ -1,38 +0,0 @@
1
- # encoding: utf-8
2
-
3
-
4
- ###
5
- # to run use
6
- # ruby -I ./lib -I ./test test/test_page_de.rb
7
-
8
-
9
- require 'helper'
10
-
11
-
12
- class TestPageDe < MiniTest::Test
13
-
14
- def setup
15
- Wikiscript.lang = :de
16
- end
17
-
18
- def test_st_poelten_de
19
- page = Wikiscript::Page.get( 'St._Pölten' )
20
- # [debug] GET /w/index.php?action=raw&title=St._P%C3%B6lten uri=http://de.wikipedia.org/w/index.php?action=raw&title=St._P%C3%B6lten, redirect_limit=5
21
- # [debug] 301 TLS Redirect location=https://de.wikipedia.org/w/index.php?action=raw&title=St._P%C3%B6lten
22
- # [debug] GET /w/index.php?action=raw&title=St._P%C3%B6lten uri=https://de.wikipedia.org/w/index.php?action=raw&title=St._P%C3%B6lten, redirect_limit=4
23
- # [debug] 200 OK
24
-
25
-
26
- text = page.text
27
-
28
- ## print first 600 chars
29
- pp text[0..600]
30
-
31
- ## check for some snippets
32
- assert /{{Infobox Gemeinde in Österreich/ =~ text
33
- assert /Name\s+=\s+St\. Pölten/ =~ text
34
- assert /'''St\. Pölten''' \(amtlicher Name,/ =~ text
35
- ## assert /Die Stadt liegt am Fluss \[\[Traisen \(Fluss\)\|Traisen\]\]/ =~ text
36
- end
37
-
38
- end # class TestPageDe
@@ -1,80 +0,0 @@
1
- # encoding: utf-8
2
-
3
- ###
4
- # to run use
5
- # ruby -I ./lib -I ./test test/test_page_reader.rb
6
-
7
-
8
- require 'helper'
9
-
10
-
11
- class TestPageReader < MiniTest::Test
12
-
13
- def test_basic
14
- el = Wikiscript.parse( <<TXT )
15
- =Heading 1==
16
- ==Heading 2==
17
- ===Heading 3===
18
-
19
- {|
20
- |-
21
- ! header1
22
- ! header2
23
- ! header3
24
- |-
25
- | row1cell1
26
- | row1cell2
27
- | row1cell3
28
- |-
29
- | row2cell1
30
- | row2cell2
31
- | row2cell3
32
- |}
33
- TXT
34
-
35
- pp el
36
-
37
- assert_equal 4, el.size
38
- assert_equal [:h1, 'Heading 1'], el[0]
39
- assert_equal [:h2, 'Heading 2'], el[1]
40
- assert_equal [:h3, 'Heading 3'], el[2]
41
- assert_equal [:table, [['header1', 'header2', 'header3'],
42
- ['row1cell1', 'row1cell2', 'row1cell3'],
43
- ['row2cell1', 'row2cell2', 'row2cell3']]], el[3]
44
- end
45
-
46
- def test_parse
47
- page = Wikiscript::Page.new( <<TXT )
48
- =Heading 1==
49
- ==Heading 2==
50
- ===Heading 3===
51
-
52
- {|
53
- |-
54
- ! header1
55
- ! header2
56
- ! header3
57
- |-
58
- | row1cell1
59
- | row1cell2
60
- | row1cell3
61
- |-
62
- | row2cell1
63
- | row2cell2
64
- | row2cell3
65
- |}
66
- TXT
67
-
68
- el = page.parse
69
- pp el
70
-
71
- assert_equal 4, el.size
72
- assert_equal [:h1, 'Heading 1'], el[0]
73
- assert_equal [:h2, 'Heading 2'], el[1]
74
- assert_equal [:h3, 'Heading 3'], el[2]
75
- assert_equal [:table, [['header1', 'header2', 'header3'],
76
- ['row1cell1', 'row1cell2', 'row1cell3'],
77
- ['row2cell1', 'row2cell2', 'row2cell3']]], el[3]
78
- end
79
-
80
- end # class TestPageReader
@@ -1,109 +0,0 @@
1
- # encoding: utf-8
2
-
3
- ###
4
- # to run use
5
- # ruby -I ./lib -I ./test test/test_table_reader.rb
6
-
7
-
8
- require 'helper'
9
-
10
- class TestTableReader < MiniTest::Test
11
-
12
- def test_basic
13
- table = Wikiscript.parse_table( <<TXT )
14
- {|
15
- |-
16
- ! header1
17
- ! header2
18
- ! header3
19
- |-
20
- | row1cell1
21
- | row1cell2
22
- | row1cell3
23
- |-
24
- | row2cell1
25
- | row2cell2
26
- | row2cell3
27
- |}
28
- TXT
29
-
30
- assert_equal 3, table.size ## three rows
31
- assert_equal ['header1', 'header2', 'header3'], table[0]
32
- assert_equal ['row1cell1', 'row1cell2', 'row1cell3'], table[1]
33
- assert_equal ['row2cell1', 'row2cell2', 'row2cell3'], table[2]
34
- end
35
-
36
- def test_basic_ii ## with optional (missing) row divider before headers
37
- table = Wikiscript.parse_table( <<TXT )
38
- {|
39
- ! header1 !! header2 !! header3
40
- |-
41
- | row1cell1 || row1cell2 || row1cell3
42
- |-
43
- | row2cell1 || row2cell2 || row2cell3
44
- |}
45
- TXT
46
-
47
- assert_equal 3, table.size ## three rows
48
- assert_equal ['header1', 'header2', 'header3'], table[0]
49
- assert_equal ['row1cell1', 'row1cell2', 'row1cell3'], table[1]
50
- assert_equal ['row2cell1', 'row2cell2', 'row2cell3'], table[2]
51
- end
52
-
53
- def test_basic_iii # with continuing header column lines
54
- table = Wikiscript.parse_table( <<TXT )
55
- {|
56
- |-
57
- !
58
- header1
59
- !
60
- header2
61
- !
62
- header3
63
- |-
64
- |
65
- row1cell1
66
- |
67
- row1cell2
68
- |
69
- row1cell3
70
- |-
71
- |
72
- row2cell1
73
- |
74
- row2cell2
75
- |
76
- row2cell3
77
- |}
78
- TXT
79
-
80
- assert_equal 3, table.size ## three rows
81
- assert_equal ['header1', 'header2', 'header3'], table[0]
82
- assert_equal ['row1cell1', 'row1cell2', 'row1cell3'], table[1]
83
- assert_equal ['row2cell1', 'row2cell2', 'row2cell3'], table[2]
84
- end
85
-
86
-
87
- def test_strip_attributes_and_emphases
88
- table = Wikiscript.parse_table( <<TXT )
89
- {|
90
- |-
91
- ! style="width:200px;"|Club
92
- ! style="width:150px;"|City
93
- |-
94
- |[[Biu Chun Rangers]]||[[Sham Shui Po]]
95
- |-
96
- |bgcolor=#ffff44 |''[[Eastern Sports Club|Eastern]]''||[[Mong Kok]]
97
- |-
98
- |[[HKFC Soccer Section]]||[[Happy Valley, Hong Kong|Happy Valley]]
99
- |}
100
- TXT
101
-
102
- assert_equal 4, table.size ## four rows
103
- assert_equal ['Club', 'City'], table[0]
104
- assert_equal ['[[Biu Chun Rangers]]', '[[Sham Shui Po]]'], table[1]
105
- assert_equal ['[[Eastern Sports Club|Eastern]]', '[[Mong Kok]]'], table[2]
106
- assert_equal ['[[HKFC Soccer Section]]', '[[Happy Valley, Hong Kong|Happy Valley]]'], table[3]
107
- end
108
-
109
- end # class TestTableReader