wikiscript 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: aa972f9f2c4580091fb3dac93cf76f94f6054ba1
4
- data.tar.gz: 138954a7a0281032b68aae0f29b9d239c9f7f90a
3
+ metadata.gz: a34183c24ce2eac79cf72edcec39c562e6c74065
4
+ data.tar.gz: 56fff65e58fc1fbbbc2eeaa111d22d6ebce84f63
5
5
  SHA512:
6
- metadata.gz: 74af5f6bd6b915a1887d9022763294f0e6e634415d1bbdf3b18ae9fc0f9a11b01812b904a42a4ed687a60cafbd50230afd8a804dab8bb6a8b28475f99437be2b
7
- data.tar.gz: 44f92623eb560825a834de570b8cfc604a92bfb907b38c2390b2d3a52cc74bcf8f229d21369c27bd73c0b99a9db9c65f5df3bf9b6d304e9876694135e7517ddc
6
+ metadata.gz: 85fa8e16dffbdfdaf6b683dae530a692eb6c570a03345ddf2a013b248bbb51aaab9ce495850bdd564a163e35c37f0f80b3f6d47b5ee994e8effe02c413ac8ba8
7
+ data.tar.gz: ca8d9133a251f634db722575100d260c79859437fb393d9ac5b8181d0725f7d54a5290b4dfa0436a13aa73338c818adeea5c89c06b03b2f2e7360aa102f3e1c4
data/README.md CHANGED
@@ -12,16 +12,203 @@ Read-only access to wikikpedia pages.
12
12
  Example - Get wikitext source (via `en.wikipedia.org/w/index.php?action=raw&title=<title>`):
13
13
 
14
14
 
15
+ ``` ruby
16
+ page = Wikiscript::Page.get( '2022_FIFA_World_Cup' ) # same as Wikiscript.get
17
+ page.text
15
18
  ```
16
- >> page = Wikiscript::Page.new( '2014_FIFA_World_Cup_squads' )
17
- >> page.text
18
19
 
19
- The [[2014 FIFA World Cup]] is an international [[association football|football]]
20
- tournament which is currently being held in Brazil from 12 June to 13 July 2014.
21
- The 32 national teams involved in the tournament were required to register
22
- a squad of 23 players, including three goalkeepers...
20
+ prints
21
+
22
+ ```
23
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
24
+ the quadrennial international men's [[association football]] championship contested by the
25
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
26
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
27
+ in the [[Arab world]] and the first in a Muslim-majority country...
28
+ ```
29
+
30
+ Or build your own page from scratch (no download):
31
+
32
+ ``` ruby
33
+ page = Wikiscript::Page.new( <<TXT, title: '2022_FIFA_World_Cup' )
34
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
35
+ the quadrennial international men's [[association football]] championship contested by the
36
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
37
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
38
+ in the [[Arab world]] and the first in a Muslim-majority country...
39
+ TXT
40
+ page.text
41
+ ```
42
+
43
+ prints
44
+
45
+ ```
46
+ The '''2022 FIFA World Cup''' is scheduled to be the 22nd edition of the [[FIFA World Cup]],
47
+ the quadrennial international men's [[association football]] championship contested by the
48
+ [[List of men's national association football teams|national teams]] of the member associations of [[FIFA]].
49
+ It is scheduled to take place in [[Qatar]] in 2022. This will be the first World Cup ever to be held
50
+ in the [[Arab world]] and the first in a Muslim-majority country...
51
+ ```
52
+
53
+
54
+ ### Tables
55
+
56
+ Parse wiki tables into an array. Example:
57
+
58
+ ``` ruby
59
+ table = Wikiscript.parse_table( <<TXT )
60
+ {|
61
+ |-
62
+ ! header1
63
+ ! header2
64
+ ! header3
65
+ |-
66
+ | row1cell1
67
+ | row1cell2
68
+ | row1cell3
69
+ |-
70
+ | row2cell1
71
+ | row2cell2
72
+ | row2cell3
73
+ |}
74
+ TXT
75
+
76
+ # -or-
77
+
78
+ table = Wikiscript.parse_table( <<TXT )
79
+ {|
80
+ ! header1 !! header2 !! header3
81
+ |-
82
+ | row1cell1 || row1cell2 || row1cell3
83
+ |-
84
+ | row2cell1 || row2cell2 || row2cell3
85
+ |}
86
+ TXT
87
+
88
+ # -or-
89
+
90
+ table = Wikiscript.parse_table( <<TXT )
91
+ {|
92
+ |-
93
+ !
94
+ header1
95
+ !
96
+ header2
97
+ !
98
+ header3
99
+ |-
100
+ |
101
+ row1cell1
102
+ |
103
+ row1cell2
104
+ |
105
+ row1cell3
106
+ |-
107
+ |
108
+ row2cell1
109
+ |
110
+ row2cell2
111
+ |
112
+ row2cell3
113
+ |}
114
+ TXT
115
+ ```
116
+
117
+ resulting in:
118
+
119
+ ``` ruby
120
+ pp table
121
+ #=> [["header1", "header2", "header3"],
122
+ # ["row1cell1", "row1cell2", "row1cell3"],
123
+ # ["row2cell1", "row2cell2", "row2cell3"]]
23
124
  ```
24
125
 
126
+ Note: `parse_table` will strip/remove (leading) style attributes (e.g. `àttribute="value" |` and (inline) bold and italic emphases (e.g. `''`) from the (cell) text. Example:
127
+
128
+ ``` ruby
129
+ table = Wikiscript.parse_table( <<TXT )
130
+ {|
131
+ |-
132
+ ! style="width:200px;"|Club
133
+ ! style="width:150px;"|City
134
+ |-
135
+ |[[Biu Chun Rangers]]||[[Sham Shui Po]]
136
+ |-
137
+ |bgcolor=#ffff44 |''[[Eastern Sports Club|Eastern]]''||[[Mong Kok]]
138
+ |-
139
+ |[[HKFC Soccer Section]]||[[Happy Valley, Hong Kong|Happy Valley]]
140
+ |}
141
+ TXT
142
+ ```
143
+
144
+ resulting in:
145
+
146
+ ``` ruby
147
+ pp table
148
+ #=> [["Club", "City"],
149
+ # ["[[Biu Chun Rangers]]", "[[Sham Shui Po]]"],
150
+ # ["[[Eastern Sports Club|Eastern]]", "[[Mong Kok]]"],
151
+ # ["[[HKFC Soccer Section]]", "[[Happy Valley, Hong Kong|Happy Valley]]"]]
152
+ ```
153
+
154
+ ### Links
155
+
156
+ Split links into two parts. Note: The alternate link title is optional. Example:
157
+
158
+ ``` ruby
159
+ link, title = Wikiscript.parse_link( '[[La Florida, Chile|La Florida]]' )
160
+ link #=> "La Florida, Chile"
161
+ title #=> "La Florida"
162
+
163
+ link, title = Wikiscript.parse_link( '[[ La Florida, Chile]]' )
164
+ link #=> "La Florida, Chile"
165
+ title #=> nil
166
+
167
+ link, title = Wikiscript.parse_link( 'La Florida' )
168
+ link #=> nil
169
+ title #=> nil
170
+ ```
171
+
172
+ ### Document Element Structure
173
+
174
+ Get the document's element structure.
175
+ Note: For now only section headings (`h1`, `h2`, `h3`, ...) and tables are supported.
176
+ Example:
177
+
178
+ ``` ruby
179
+ nodes = Wikiscript.parse( <<TXT )
180
+ =Heading 1==
181
+ ==Heading 2==
182
+ ===Heading 3===
183
+
184
+ {|
185
+ |-
186
+ ! header1
187
+ ! header2
188
+ ! header3
189
+ |-
190
+ | row1cell1
191
+ | row1cell2
192
+ | row1cell3
193
+ |-
194
+ | row2cell1
195
+ | row2cell2
196
+ | row2cell3
197
+ |}
198
+ TXT
199
+
200
+ pp nodes
201
+ #=> [[:h1, "Heading 1"],
202
+ # [:h2, "Heading 2"],
203
+ # [:h3, "Heading 3"],
204
+ # [:table, [["header1", "header2", "header3"],
205
+ # ["row1cell1", "row1cell2", "row1cell3"],
206
+ # ["row2cell1", "row2cell2", "row2cell3"]]]
207
+ ```
208
+
209
+
210
+ That's all for now. More functionality will get added over time.
211
+
25
212
 
26
213
 
27
214
  ## Install
data/Rakefile CHANGED
@@ -5,7 +5,7 @@ Hoe.spec 'wikiscript' do
5
5
 
6
6
  self.version = Wikiscript::VERSION
7
7
 
8
- self.summary = 'wikiscript - scripts for wikipedia (get wikitext for page etc.)'
8
+ self.summary = "wikiscript - scripts for wikipedia (get wikitext for page, parse tables 'n' links, etc.)"
9
9
  self.description = summary
10
10
 
11
11
  self.urls = ['https://github.com/wikiscript/wikiscript']
@@ -33,11 +33,23 @@ module Wikiscript
33
33
  end
34
34
 
35
35
  def text
36
- @text ||= get # cache text (from request)
36
+ @text ||= get # cache text (from request)
37
+ end
38
+
39
+ def nodes
40
+ @nodes ||= parse # cache text (from parse)
41
+ end
42
+
43
+ def each ## loop over all nodes / elements -note: nodes is a (flat) list (array) for now
44
+ nodes.each do |node|
45
+ yield( node )
46
+ end
37
47
  end
38
48
 
39
49
 
40
50
  def get ## "force" refresh text (get/fetch/download)
51
+ @nodes = nil ## note: reset cached parsed nodes too
52
+
41
53
  @text = Client.new.text( @title, lang: @lang )
42
54
  @text
43
55
  end
@@ -45,9 +57,9 @@ module Wikiscript
45
57
  alias_method :download, :get
46
58
 
47
59
 
48
-
49
60
  def parse ## todo/change: use/find a different name e.g. doc/elements/etc. - why? why not?
50
- PageReader.parse( text )
61
+ @nodes = PageReader.parse( text )
62
+ @nodes
51
63
  end
52
64
  end # class Page
53
65
  end # Wikiscript
@@ -1,6 +1,6 @@
1
1
 
2
2
  module Wikiscript
3
- VERSION = '0.3.1'
3
+ VERSION = '0.3.2'
4
4
 
5
5
  def self.banner
6
6
  "wikiscript/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
@@ -11,7 +11,7 @@ require 'helper'
11
11
  class TestPageReader < MiniTest::Test
12
12
 
13
13
  def test_basic
14
- el = Wikiscript.parse( <<TXT )
14
+ nodes = Wikiscript.parse( <<TXT )
15
15
  =Heading 1==
16
16
  ==Heading 2==
17
17
  ===Heading 3===
@@ -32,15 +32,15 @@ class TestPageReader < MiniTest::Test
32
32
  |}
33
33
  TXT
34
34
 
35
- pp el
35
+ pp nodes
36
36
 
37
- assert_equal 4, el.size
38
- assert_equal [:h1, 'Heading 1'], el[0]
39
- assert_equal [:h2, 'Heading 2'], el[1]
40
- assert_equal [:h3, 'Heading 3'], el[2]
37
+ assert_equal 4, nodes.size
38
+ assert_equal [:h1, 'Heading 1'], nodes[0]
39
+ assert_equal [:h2, 'Heading 2'], nodes[1]
40
+ assert_equal [:h3, 'Heading 3'], nodes[2]
41
41
  assert_equal [:table, [['header1', 'header2', 'header3'],
42
42
  ['row1cell1', 'row1cell2', 'row1cell3'],
43
- ['row2cell1', 'row2cell2', 'row2cell3']]], el[3]
43
+ ['row2cell1', 'row2cell2', 'row2cell3']]], nodes[3]
44
44
  end
45
45
 
46
46
  def test_parse
@@ -65,16 +65,53 @@ TXT
65
65
  |}
66
66
  TXT
67
67
 
68
- el = page.parse
69
- pp el
68
+ nodes = page.parse
69
+ pp nodes
70
70
 
71
- assert_equal 4, el.size
72
- assert_equal [:h1, 'Heading 1'], el[0]
73
- assert_equal [:h2, 'Heading 2'], el[1]
74
- assert_equal [:h3, 'Heading 3'], el[2]
71
+ assert_equal 4, nodes.size
72
+ assert_equal [:h1, 'Heading 1'], nodes[0]
73
+ assert_equal [:h2, 'Heading 2'], nodes[1]
74
+ assert_equal [:h3, 'Heading 3'], nodes[2]
75
75
  assert_equal [:table, [['header1', 'header2', 'header3'],
76
76
  ['row1cell1', 'row1cell2', 'row1cell3'],
77
- ['row2cell1', 'row2cell2', 'row2cell3']]], el[3]
77
+ ['row2cell1', 'row2cell2', 'row2cell3']]], nodes[3]
78
+ end
79
+
80
+ def test_each
81
+ page = Wikiscript::Page.new( <<TXT )
82
+ =Heading 1==
83
+ ==Heading 2==
84
+ ===Heading 3===
85
+
86
+ {|
87
+ |-
88
+ ! header1
89
+ ! header2
90
+ ! header3
91
+ |-
92
+ | row1cell1
93
+ | row1cell2
94
+ | row1cell3
95
+ |-
96
+ | row2cell1
97
+ | row2cell2
98
+ | row2cell3
99
+ |}
100
+ TXT
101
+
102
+ nodes = []
103
+ page.each do |node|
104
+ nodes << node
105
+ end
106
+ pp nodes
107
+
108
+ assert_equal 4, nodes.size
109
+ assert_equal [:h1, 'Heading 1'], nodes[0]
110
+ assert_equal [:h2, 'Heading 2'], nodes[1]
111
+ assert_equal [:h3, 'Heading 3'], nodes[2]
112
+ assert_equal [:table, [['header1', 'header2', 'header3'],
113
+ ['row1cell1', 'row1cell2', 'row1cell3'],
114
+ ['row2cell1', 'row2cell2', 'row2cell3']]], nodes[3]
78
115
  end
79
116
 
80
117
  end # class TestPageReader
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wikiscript
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-09-21 00:00:00.000000000 Z
11
+ date: 2019-09-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: logutils
@@ -66,7 +66,8 @@ dependencies:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
68
  version: '3.16'
69
- description: wikiscript - scripts for wikipedia (get wikitext for page etc.)
69
+ description: wikiscript - scripts for wikipedia (get wikitext for page, parse tables
70
+ 'n' links, etc.)
70
71
  email: opensport@googlegroups.com
71
72
  executables: []
72
73
  extensions: []
@@ -120,5 +121,6 @@ rubyforge_project:
120
121
  rubygems_version: 2.5.2
121
122
  signing_key:
122
123
  specification_version: 4
123
- summary: wikiscript - scripts for wikipedia (get wikitext for page etc.)
124
+ summary: wikiscript - scripts for wikipedia (get wikitext for page, parse tables 'n'
125
+ links, etc.)
124
126
  test_files: []