wikiscript 0.3.1 → 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: aa972f9f2c4580091fb3dac93cf76f94f6054ba1
4
- data.tar.gz: 138954a7a0281032b68aae0f29b9d239c9f7f90a
3
+ metadata.gz: a34183c24ce2eac79cf72edcec39c562e6c74065
4
+ data.tar.gz: 56fff65e58fc1fbbbc2eeaa111d22d6ebce84f63
5
5
  SHA512:
6
- metadata.gz: 74af5f6bd6b915a1887d9022763294f0e6e634415d1bbdf3b18ae9fc0f9a11b01812b904a42a4ed687a60cafbd50230afd8a804dab8bb6a8b28475f99437be2b
7
- data.tar.gz: 44f92623eb560825a834de570b8cfc604a92bfb907b38c2390b2d3a52cc74bcf8f229d21369c27bd73c0b99a9db9c65f5df3bf9b6d304e9876694135e7517ddc
6
+ metadata.gz: 85fa8e16dffbdfdaf6b683dae530a692eb6c570a03345ddf2a013b248bbb51aaab9ce495850bdd564a163e35c37f0f80b3f6d47b5ee994e8effe02c413ac8ba8
7
+ data.tar.gz: ca8d9133a251f634db722575100d260c79859437fb393d9ac5b8181d0725f7d54a5290b4dfa0436a13aa73338c818adeea5c89c06b03b2f2e7360aa102f3e1c4
data/README.md CHANGED
@@ -12,16 +12,203 @@ Read-only access to wikikpedia pages.
12
12
  Example - Get wikitext source (via `en.wikipedia.org/w/index.php?action=raw&title=<title>`):
13
13
 
14
14
 
15
+ ``` ruby
16
+ page = Wikiscript::Page.get( '2022_FIFA_World_Cup' ) # same as Wikiscript.get
17
+ page.text
15
18
  ```
16
- >> page = Wikiscript::Page.new( '2014_FIFA_World_Cup_squads' )
17
- >> page.text
18
19
 
19
- The [[2014 FIFA World Cup]] is an international [[association football|football]]
20
- tournament which is currently being held in Brazil from 12 June to 13 July 2014.
21
- The 32 national teams involved in the tournament were required to register
22
- a squad of 23 players, including three goalkeepers...
20
+ prints
21
+
22
+ ```
23
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
24
+ the quadrennial international men's [[association football]] championship contested by the
25
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
26
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
27
+ in the [[Arab world]] and the first in a Muslim-majority country...
28
+ ```
29
+
30
+ Or build your own page from scratch (no download):
31
+
32
+ ``` ruby
33
+ page = Wikiscript::Page.new( <<TXT, title: '2022_FIFA_World_Cup' )
34
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
35
+ the quadrennial international men's [[association football]] championship contested by the
36
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
37
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
38
+ in the [[Arab world]] and the first in a Muslim-majority country...
39
+ TXT
40
+ page.text
41
+ ```
42
+
43
+ prints
44
+
45
+ ```
46
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
47
+ the quadrennial international men's [[association football]] championship contested by the
48
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
49
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
50
+ in the [[Arab world]] and the first in a Muslim-majority country...
51
+ ```
52
+
53
+
54
+ ### Tables
55
+
56
+ Parse wiki tables into an array. Example:
57
+
58
+ ``` ruby
59
+ table = Wikiscript.parse_table( <<TXT )
60
+ {|
61
+ |-
62
+ ! header1
63
+ ! header2
64
+ ! header3
65
+ |-
66
+ | row1cell1
67
+ | row1cell2
68
+ | row1cell3
69
+ |-
70
+ | row2cell1
71
+ | row2cell2
72
+ | row2cell3
73
+ |}
74
+ TXT
75
+
76
+ # -or-
77
+
78
+ table = Wikiscript.parse_table( <<TXT )
79
+ {|
80
+ ! header1 !! header2 !! header3
81
+ |-
82
+ | row1cell1 || row1cell2 || row1cell3
83
+ |-
84
+ | row2cell1 || row2cell2 || row2cell3
85
+ |}
86
+ TXT
87
+
88
+ # -or-
89
+
90
+ table = Wikiscript.parse_table( <<TXT )
91
+ {|
92
+ |-
93
+ !
94
+ header1
95
+ !
96
+ header2
97
+ !
98
+ header3
99
+ |-
100
+ |
101
+ row1cell1
102
+ |
103
+ row1cell2
104
+ |
105
+ row1cell3
106
+ |-
107
+ |
108
+ row2cell1
109
+ |
110
+ row2cell2
111
+ |
112
+ row2cell3
113
+ |}
114
+ TXT
115
+ ```
116
+
117
+ resulting in:
118
+
119
+ ``` ruby
120
+ pp table
121
+ #=> [["header1", "header2", "header3"],
122
+ # ["row1cell1", "row1cell2", "row1cell3"],
123
+ # ["row2cell1", "row2cell2", "row2cell3"]]
23
124
  ```
24
125
 
126
+ Note: `parse_table` will strip/remove (leading) style attributes (e.g. `àttribute="value" |` and (inline) bold and italic emphases (e.g. `''`) from the (cell) text. Example:
127
+
128
+ ``` ruby
129
+ table = Wikiscript.parse_table( <<TXT )
130
+ {|
131
+ |-
132
+ ! style="width:200px;"|Club
133
+ ! style="width:150px;"|City
134
+ |-
135
+ |[[Biu Chun Rangers]]||[[Sham Shui Po]]
136
+ |-
137
+ |bgcolor=#ffff44 |''[[Eastern Sports Club|Eastern]]''||[[Mong Kok]]
138
+ |-
139
+ |[[HKFC Soccer Section]]||[[Happy Valley, Hong Kong|Happy Valley]]
140
+ |}
141
+ TXT
142
+ ```
143
+
144
+ resulting in:
145
+
146
+ ``` ruby
147
+ pp table
148
+ #=> [["Club", "City"],
149
+ # ["[[Biu Chun Rangers]]", "[[Sham Shui Po]]"],
150
+ # ["[[Eastern Sports Club|Eastern]]", "[[Mong Kok]]"],
151
+ # ["[[HKFC Soccer Section]]", "[[Happy Valley, Hong Kong|Happy Valley]]"]]
152
+ ```
153
+
154
+ ### Links
155
+
156
+ Split links into two parts. Note: The alternate link title is optional. Example:
157
+
158
+ ``` ruby
159
+ link, title = Wikiscript.parse_link( '[[La Florida, Chile|La Florida]]' )
160
+ link #=> "La Florida, Chile"
161
+ title #=> "La Florida"
162
+
163
+ link, title = Wikiscript.parse_link( '[[ La Florida, Chile]]' )
164
+ link #=> "La Florida, Chile"
165
+ title #=> nil
166
+
167
+ link, title = Wikiscript.parse_link( 'La Florida' )
168
+ link #=> nil
169
+ title #=> nil
170
+ ```
171
+
172
+ ### Document Element Structure
173
+
174
+ Get the document's element structure.
175
+ Note: For now only section headings (`h1`, `h2`, `h3`, ...) and tables are supported.
176
+ Example:
177
+
178
+ ``` ruby
179
+ nodes = Wikiscript.parse( <<TXT )
180
+ =Heading 1==
181
+ ==Heading 2==
182
+ ===Heading 3===
183
+
184
+ {|
185
+ |-
186
+ ! header1
187
+ ! header2
188
+ ! header3
189
+ |-
190
+ | row1cell1
191
+ | row1cell2
192
+ | row1cell3
193
+ |-
194
+ | row2cell1
195
+ | row2cell2
196
+ | row2cell3
197
+ |}
198
+ TXT
199
+
200
+ pp nodes
201
+ #=> [[:h1, "Heading 1"],
202
+ # [:h2, "Heading 2"],
203
+ # [:h3, "Heading 3"],
204
+ # [:table, [["header1", "header2", "header3"],
205
+ # ["row1cell1", "row1cell2", "row1cell3"],
206
+ # ["row2cell1", "row2cell2", "row2cell3"]]]
207
+ ```
208
+
209
+
210
+ That's all for now. More functionality will get added over time.
211
+
25
212
 
26
213
 
27
214
  ## Install
data/Rakefile CHANGED
@@ -5,7 +5,7 @@ Hoe.spec 'wikiscript' do
5
5
 
6
6
  self.version = Wikiscript::VERSION
7
7
 
8
- self.summary = 'wikiscript - scripts for wikipedia (get wikitext for page etc.)'
8
+ self.summary = "wikiscript - scripts for wikipedia (get wikitext for page, parse tables 'n' links, etc.)"
9
9
  self.description = summary
10
10
 
11
11
  self.urls = ['https://github.com/wikiscript/wikiscript']
@@ -33,11 +33,23 @@ module Wikiscript
33
33
  end
34
34
 
35
35
  def text
36
- @text ||= get # cache text (from request)
36
+ @text ||= get # cache text (from request)
37
+ end
38
+
39
+ def nodes
40
+ @nodes ||= parse # cache text (from parse)
41
+ end
42
+
43
+ def each ## loop over all nodes / elements -note: nodes is a (flat) list (array) for now
44
+ nodes.each do |node|
45
+ yield( node )
46
+ end
37
47
  end
38
48
 
39
49
 
40
50
  def get ## "force" refresh text (get/fetch/download)
51
+ @nodes = nil ## note: reset cached parsed nodes too
52
+
41
53
  @text = Client.new.text( @title, lang: @lang )
42
54
  @text
43
55
  end
@@ -45,9 +57,9 @@ module Wikiscript
45
57
  alias_method :download, :get
46
58
 
47
59
 
48
-
49
60
  def parse ## todo/change: use/find a different name e.g. doc/elements/etc. - why? why not?
50
- PageReader.parse( text )
61
+ @nodes = PageReader.parse( text )
62
+ @nodes
51
63
  end
52
64
  end # class Page
53
65
  end # Wikiscript
@@ -1,6 +1,6 @@
1
1
 
2
2
  module Wikiscript
3
- VERSION = '0.3.1'
3
+ VERSION = '0.3.2'
4
4
 
5
5
  def self.banner
6
6
  "wikiscript/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
@@ -11,7 +11,7 @@ require 'helper'
11
11
  class TestPageReader < MiniTest::Test
12
12
 
13
13
  def test_basic
14
- el = Wikiscript.parse( <<TXT )
14
+ nodes = Wikiscript.parse( <<TXT )
15
15
  =Heading 1==
16
16
  ==Heading 2==
17
17
  ===Heading 3===
@@ -32,15 +32,15 @@ class TestPageReader < MiniTest::Test
32
32
  |}
33
33
  TXT
34
34
 
35
- pp el
35
+ pp nodes
36
36
 
37
- assert_equal 4, el.size
38
- assert_equal [:h1, 'Heading 1'], el[0]
39
- assert_equal [:h2, 'Heading 2'], el[1]
40
- assert_equal [:h3, 'Heading 3'], el[2]
37
+ assert_equal 4, nodes.size
38
+ assert_equal [:h1, 'Heading 1'], nodes[0]
39
+ assert_equal [:h2, 'Heading 2'], nodes[1]
40
+ assert_equal [:h3, 'Heading 3'], nodes[2]
41
41
  assert_equal [:table, [['header1', 'header2', 'header3'],
42
42
  ['row1cell1', 'row1cell2', 'row1cell3'],
43
- ['row2cell1', 'row2cell2', 'row2cell3']]], el[3]
43
+ ['row2cell1', 'row2cell2', 'row2cell3']]], nodes[3]
44
44
  end
45
45
 
46
46
  def test_parse
@@ -65,16 +65,53 @@ TXT
65
65
  |}
66
66
  TXT
67
67
 
68
- el = page.parse
69
- pp el
68
+ nodes = page.parse
69
+ pp nodes
70
70
 
71
- assert_equal 4, el.size
72
- assert_equal [:h1, 'Heading 1'], el[0]
73
- assert_equal [:h2, 'Heading 2'], el[1]
74
- assert_equal [:h3, 'Heading 3'], el[2]
71
+ assert_equal 4, nodes.size
72
+ assert_equal [:h1, 'Heading 1'], nodes[0]
73
+ assert_equal [:h2, 'Heading 2'], nodes[1]
74
+ assert_equal [:h3, 'Heading 3'], nodes[2]
75
75
  assert_equal [:table, [['header1', 'header2', 'header3'],
76
76
  ['row1cell1', 'row1cell2', 'row1cell3'],
77
- ['row2cell1', 'row2cell2', 'row2cell3']]], el[3]
77
+ ['row2cell1', 'row2cell2', 'row2cell3']]], nodes[3]
78
+ end
79
+
80
+ def test_each
81
+ page = Wikiscript::Page.new( <<TXT )
82
+ =Heading 1==
83
+ ==Heading 2==
84
+ ===Heading 3===
85
+
86
+ {|
87
+ |-
88
+ ! header1
89
+ ! header2
90
+ ! header3
91
+ |-
92
+ | row1cell1
93
+ | row1cell2
94
+ | row1cell3
95
+ |-
96
+ | row2cell1
97
+ | row2cell2
98
+ | row2cell3
99
+ |}
100
+ TXT
101
+
102
+ nodes = []
103
+ page.each do |node|
104
+ nodes << node
105
+ end
106
+ pp nodes
107
+
108
+ assert_equal 4, nodes.size
109
+ assert_equal [:h1, 'Heading 1'], nodes[0]
110
+ assert_equal [:h2, 'Heading 2'], nodes[1]
111
+ assert_equal [:h3, 'Heading 3'], nodes[2]
112
+ assert_equal [:table, [['header1', 'header2', 'header3'],
113
+ ['row1cell1', 'row1cell2', 'row1cell3'],
114
+ ['row2cell1', 'row2cell2', 'row2cell3']]], nodes[3]
78
115
  end
79
116
 
80
117
  end # class TestPageReader
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wikiscript
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-09-21 00:00:00.000000000 Z
11
+ date: 2019-09-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: logutils
@@ -66,7 +66,8 @@ dependencies:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
68
  version: '3.16'
69
- description: wikiscript - scripts for wikipedia (get wikitext for page etc.)
69
+ description: wikiscript - scripts for wikipedia (get wikitext for page, parse tables
70
+ 'n' links, etc.)
70
71
  email: opensport@googlegroups.com
71
72
  executables: []
72
73
  extensions: []
@@ -120,5 +121,6 @@ rubyforge_project:
120
121
  rubygems_version: 2.5.2
121
122
  signing_key:
122
123
  specification_version: 4
123
- summary: wikiscript - scripts for wikipedia (get wikitext for page etc.)
124
+ summary: wikiscript - scripts for wikipedia (get wikitext for page, parse tables 'n'
125
+ links, etc.)
124
126
  test_files: []