ae_easy-text 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 239189344e783f67b085da7394e535aa693a4b067c62b8d0b16f733a0b19d4f7
4
+ data.tar.gz: ca144105f26e399116b05560ff870f6aa051a04696602f6f68f67f06b9e0bfda
5
+ SHA512:
6
+ metadata.gz: 0b7c4495eeb71e5dae3ad799d14f8a2d83989a949183ee3df2837191b4a4f3a10965ead38416ccda078da7b29fc083eb02fd53a24999f97473b02f77489d921c
7
+ data.tar.gz: 4f377b26bcfb0ef4cce7806d153fb97de115d0e0bc4beef5e43b55b0125e1936d6c4440d1131cbe9cb4d5fe28a1810c2820269f6c317511068c91aff42ad8126
data/.gitignore ADDED
@@ -0,0 +1,12 @@
1
+ /.byebug*
2
+ /.bundle/
3
+ /.yardoc
4
+ /_yardoc/
5
+ /coverage/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+ /certs/
10
+ /checksum/
11
+ /vendor/
12
+ /Gemfile.lock
data/.travis.yml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ sudo: false
3
+ language: ruby
4
+ cache: bundler
5
+ rvm:
6
+ - 2.4.2
7
+ before_install: gem install bundler -v 1.16.3
data/.yardopts ADDED
@@ -0,0 +1 @@
1
+ --no-private
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at parama@answersengine.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in answersengine.gemspec
6
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2019 AnswersEngine
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,16 @@
1
+ [![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](http://rubydoc.org/gems/ae_easy-text/frames)
2
+ [![Gem Version](https://badge.fury.io/rb/ae_easy-text.svg)](http://github.com/answersengine/ae_easy-text/releases)
3
+ [![License](http://img.shields.io/badge/license-MIT-yellowgreen.svg)](#license)
4
+
5
+ # AeEasy text module
6
+ ## Description
7
+
8
+ AeEasy text is part of AeEasy gem collection. It provides multiple text parsing helpers to ease common text parsing user cases.
9
+
10
+ Install gem:
11
+ ```gem install 'ae_easy-text'```
12
+
13
+ Require gem:
14
+ ```require 'ae_easy-text'```
15
+
16
+ Documentation can be found [here](http://rubydoc.org/gems/ae_easy-text/frames).
data/Rakefile ADDED
@@ -0,0 +1,22 @@
1
+ require 'benchmark'
2
+ require 'bundler/gem_tasks'
3
+ require 'rake/testtask'
4
+
5
+ Rake::TestTask.new do |t|
6
+ t.libs = ['lib', 'test']
7
+ t.warning = false
8
+ t.verbose = false
9
+ t.test_files = FileList['./test/**/*_test.rb']
10
+ end
11
+
12
+ desc 'Benchmark another task execution | usage example: benchmark[my_task, param1, param2]'
13
+ task :benchmark, [:task] do |task, args|
14
+ task_name = args[:task]
15
+ if task_name.nil?
16
+ puts "Should select a task."
17
+ exit 1
18
+ end
19
+ puts Benchmark.measure{ Rake::Task[task_name].invoke *args.extras }
20
+ end
21
+
22
+ task default: :test
@@ -0,0 +1,49 @@
1
+
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "ae_easy/text/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "ae_easy-text"
8
+ spec.version = AeEasy::Text::VERSION
9
+ spec.authors = ["Eduardo Rosales"]
10
+ spec.email = ["eduardo@datahen.com"]
11
+
12
+ spec.summary = %q{AnswersEngine Easy toolkit text module}
13
+ spec.description = %q{AnswersEngine Easy toolkit text module contains multiple text parsing helpers.}
14
+ spec.homepage = "https://answersengine.com"
15
+ spec.license = "MIT"
16
+
17
+ # spec.cert_chain = ['certs/ae_easy.pem']
18
+ # spec.signing_key = File.expand_path("~/.ssh/gems/gem-private_ae_easy.pem") if $0 =~ /gem\z/
19
+
20
+ # Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
21
+ # to allow pushing to a single host or delete this section to allow pushing to any host.
22
+ if spec.respond_to?(:metadata)
23
+ # spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
24
+
25
+ spec.metadata["homepage_uri"] = spec.homepage
26
+ spec.metadata["source_code_uri"] = "https://github.com/answersengine/ae_easy-text"
27
+ # spec.metadata["changelog_uri"] = "TODO: Put your gem's CHANGELOG.md URL here."
28
+ else
29
+ raise "RubyGems 2.0 or newer is required to protect against " \
30
+ "public gem pushes."
31
+ end
32
+
33
+ # Specify which files should be added to the gem when it is released.
34
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
35
+ spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
36
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
37
+ end
38
+ spec.require_paths = ["lib"]
39
+ spec.required_ruby_version = '>= 2.2.2'
40
+
41
+ spec.add_dependency 'ae_easy-core', '>= 0'
42
+ spec.add_development_dependency 'bundler', '>= 1.16.3'
43
+ spec.add_development_dependency 'rake', '>= 10.0'
44
+ spec.add_development_dependency 'minitest', '>= 5.11'
45
+ spec.add_development_dependency 'simplecov', '>= 0.16.1'
46
+ spec.add_development_dependency 'simplecov-console', '>= 0.4.2'
47
+ spec.add_development_dependency 'timecop', '>= 0.9.1'
48
+ spec.add_development_dependency 'byebug', '>= 0'
49
+ end
data/doc/AeEasy.html ADDED
@@ -0,0 +1,117 @@
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>
7
+ Module: AeEasy
8
+
9
+ &mdash; Documentation by YARD 0.9.18
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ pathId = "AeEasy";
19
+ relpath = '';
20
+ </script>
21
+
22
+
23
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
24
+
25
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
26
+
27
+
28
+ </head>
29
+ <body>
30
+ <div class="nav_wrap">
31
+ <iframe id="nav" src="class_list.html?1"></iframe>
32
+ <div id="resizer"></div>
33
+ </div>
34
+
35
+ <div id="main" tabindex="-1">
36
+ <div id="header">
37
+ <div id="menu">
38
+
39
+ <a href="_index.html">Index (A)</a> &raquo;
40
+
41
+
42
+ <span class="title">AeEasy</span>
43
+
44
+ </div>
45
+
46
+ <div id="search">
47
+
48
+ <a class="full_list_link" id="class_list_link"
49
+ href="class_list.html">
50
+
51
+ <svg width="24" height="24">
52
+ <rect x="0" y="4" width="24" height="4" rx="1" ry="1"></rect>
53
+ <rect x="0" y="12" width="24" height="4" rx="1" ry="1"></rect>
54
+ <rect x="0" y="20" width="24" height="4" rx="1" ry="1"></rect>
55
+ </svg>
56
+ </a>
57
+
58
+ </div>
59
+ <div class="clear"></div>
60
+ </div>
61
+
62
+ <div id="content"><h1>Module: AeEasy
63
+
64
+
65
+
66
+ </h1>
67
+ <div class="box_info">
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+
79
+ <dl>
80
+ <dt>Defined in:</dt>
81
+ <dd>lib/ae_easy/text.rb<span class="defines">,<br />
82
+ lib/ae_easy/text/version.rb</span>
83
+ </dd>
84
+ </dl>
85
+
86
+ </div>
87
+
88
+ <h2>Defined Under Namespace</h2>
89
+ <p class="children">
90
+
91
+
92
+ <strong class="modules">Modules:</strong> <span class='object_link'><a href="AeEasy/Text.html" title="AeEasy::Text (module)">Text</a></span>
93
+
94
+
95
+
96
+
97
+ </p>
98
+
99
+
100
+
101
+
102
+
103
+
104
+
105
+
106
+
107
+ </div>
108
+
109
+ <div id="footer">
110
+ Generated on Tue Feb 26 16:50:02 2019 by
111
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
112
+ 0.9.18 (ruby-2.5.3).
113
+ </div>
114
+
115
+ </div>
116
+ </body>
117
+ </html>
@@ -0,0 +1,2024 @@
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>
7
+ Module: AeEasy::Text
8
+
9
+ &mdash; Documentation by YARD 0.9.18
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="../css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="../css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ pathId = "AeEasy::Text";
19
+ relpath = '../';
20
+ </script>
21
+
22
+
23
+ <script type="text/javascript" charset="utf-8" src="../js/jquery.js"></script>
24
+
25
+ <script type="text/javascript" charset="utf-8" src="../js/app.js"></script>
26
+
27
+
28
+ </head>
29
+ <body>
30
+ <div class="nav_wrap">
31
+ <iframe id="nav" src="../class_list.html?1"></iframe>
32
+ <div id="resizer"></div>
33
+ </div>
34
+
35
+ <div id="main" tabindex="-1">
36
+ <div id="header">
37
+ <div id="menu">
38
+
39
+ <a href="../_index.html">Index (T)</a> &raquo;
40
+ <span class='title'><span class='object_link'><a href="../AeEasy.html" title="AeEasy (module)">AeEasy</a></span></span>
41
+ &raquo;
42
+ <span class="title">Text</span>
43
+
44
+ </div>
45
+
46
+ <div id="search">
47
+
48
+ <a class="full_list_link" id="class_list_link"
49
+ href="../class_list.html">
50
+
51
+ <svg width="24" height="24">
52
+ <rect x="0" y="4" width="24" height="4" rx="1" ry="1"></rect>
53
+ <rect x="0" y="12" width="24" height="4" rx="1" ry="1"></rect>
54
+ <rect x="0" y="20" width="24" height="4" rx="1" ry="1"></rect>
55
+ </svg>
56
+ </a>
57
+
58
+ </div>
59
+ <div class="clear"></div>
60
+ </div>
61
+
62
+ <div id="content"><h1>Module: AeEasy::Text
63
+
64
+
65
+
66
+ </h1>
67
+ <div class="box_info">
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+
79
+ <dl>
80
+ <dt>Defined in:</dt>
81
+ <dd>lib/ae_easy/text.rb<span class="defines">,<br />
82
+ lib/ae_easy/text/version.rb</span>
83
+ </dd>
84
+ </dl>
85
+
86
+ </div>
87
+
88
+
89
+
90
+ <h2>
91
+ Constant Summary
92
+ <small><a href="#" class="constants_summary_toggle">collapse</a></small>
93
+ </h2>
94
+
95
+ <dl class="constants">
96
+
97
+ <dt id="VERSION-constant" class="">VERSION =
98
+ <div class="docstring">
99
+ <div class="discussion">
100
+
101
+ <p>Gem version</p>
102
+
103
+
104
+ </div>
105
+ </div>
106
+ <div class="tags">
107
+
108
+
109
+ </div>
110
+ </dt>
111
+ <dd><pre class="code"><span class='tstring'><span class='tstring_beg'>&quot;</span><span class='tstring_content'>0.0.1</span><span class='tstring_end'>&quot;</span></span></pre></dd>
112
+
113
+ </dl>
114
+
115
+
116
+
117
+
118
+
119
+
120
+
121
+
122
+
123
+ <h2>
124
+ Class Method Summary
125
+ <small><a href="#" class="summary_toggle">collapse</a></small>
126
+ </h2>
127
+
128
+ <ul class="summary">
129
+
130
+ <li class="public ">
131
+ <span class="summary_signature">
132
+
133
+ <a href="#decode_html-class_method" title="decode_html (class method)">.<strong>decode_html</strong>(text) &#x21d2; String </a>
134
+
135
+
136
+
137
+ </span>
138
+
139
+
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+ <span class="summary_desc"><div class='inline'>
148
+ <p>Decode HTML entities from text .</p>
149
+ </div></span>
150
+
151
+ </li>
152
+
153
+
154
+ <li class="public ">
155
+ <span class="summary_signature">
156
+
157
+ <a href="#default_parser-class_method" title="default_parser (class method)">.<strong>default_parser</strong>(cell_element, data, key) &#x21d2; Object </a>
158
+
159
+
160
+
161
+ </span>
162
+
163
+
164
+
165
+
166
+
167
+
168
+
169
+
170
+
171
+ <span class="summary_desc"><div class='inline'>
172
+ <p>Default cell content parser used to parse cell element.</p>
173
+ </div></span>
174
+
175
+ </li>
176
+
177
+
178
+ <li class="public ">
179
+ <span class="summary_signature">
180
+
181
+ <a href="#encode_html-class_method" title="encode_html (class method)">.<strong>encode_html</strong>(text) &#x21d2; String </a>
182
+
183
+
184
+
185
+ </span>
186
+
187
+
188
+
189
+
190
+
191
+
192
+
193
+
194
+
195
+ <span class="summary_desc"><div class='inline'>
196
+ <p>Encode text for valid HTML entities.</p>
197
+ </div></span>
198
+
199
+ </li>
200
+
201
+
202
+ <li class="public ">
203
+ <span class="summary_signature">
204
+
205
+ <a href="#hash-class_method" title="hash (class method)">.<strong>hash</strong>(object) &#x21d2; String </a>
206
+
207
+
208
+
209
+ </span>
210
+
211
+
212
+
213
+
214
+
215
+
216
+
217
+
218
+
219
+ <span class="summary_desc"><div class='inline'>
220
+ <p>Create a hash from object.</p>
221
+ </div></span>
222
+
223
+ </li>
224
+
225
+
226
+ <li class="public ">
227
+ <span class="summary_signature">
228
+
229
+ <a href="#parse_content-class_method" title="parse_content (class method)">.<strong>parse_content</strong>(opts) {|data, row, header_map| ... } &#x21d2; Array&lt;Hash&gt;<sup>?</sup> </a>
230
+
231
+
232
+
233
+ </span>
234
+
235
+
236
+
237
+
238
+
239
+
240
+
241
+
242
+
243
+ <span class="summary_desc"><div class='inline'>
244
+ <p>Parse row data matching a selector using a header map to translate
245
+ between columns and friendly keys.</p>
246
+ </div></span>
247
+
248
+ </li>
249
+
250
+
251
+ <li class="public ">
252
+ <span class="summary_signature">
253
+
254
+ <a href="#parse_header_map-class_method" title="parse_header_map (class method)">.<strong>parse_header_map</strong>(opts = {}) &#x21d2; Hash{Symbol,String =&gt; Integer}<sup>?</sup> </a>
255
+
256
+
257
+
258
+ </span>
259
+
260
+
261
+
262
+
263
+
264
+
265
+
266
+
267
+
268
+ <span class="summary_desc"><div class='inline'>
269
+ <p>Parse header from selector and create a header map to match a column key
270
+ with column index.</p>
271
+ </div></span>
272
+
273
+ </li>
274
+
275
+
276
+ <li class="public ">
277
+ <span class="summary_signature">
278
+
279
+ <a href="#parse_table-class_method" title="parse_table (class method)">.<strong>parse_table</strong>(opts = {}) {|data, row, header_map| ... } &#x21d2; Hash{Symbol =&gt; Array,Hash,nil} </a>
280
+
281
+
282
+
283
+ </span>
284
+
285
+
286
+
287
+
288
+
289
+
290
+
291
+
292
+
293
+ <span class="summary_desc"><div class='inline'>
294
+ <p>Parse data from a horizontal table like structure matching a selectors and
295
+ using a header map to match columns.</p>
296
+ </div></span>
297
+
298
+ </li>
299
+
300
+
301
+ <li class="public ">
302
+ <span class="summary_signature">
303
+
304
+ <a href="#parse_vertical_table-class_method" title="parse_vertical_table (class method)">.<strong>parse_vertical_table</strong>(opts = {}) {|data, row, header_map| ... } &#x21d2; Hash{Symbol =&gt; Array,Hash,nil} </a>
305
+
306
+
307
+
308
+ </span>
309
+
310
+
311
+
312
+
313
+
314
+
315
+
316
+
317
+
318
+ <span class="summary_desc"><div class='inline'>
319
+ <p>Parse data from a vertical table like structure matching a selectors and
320
+ using a header map to match columns.</p>
321
+ </div></span>
322
+
323
+ </li>
324
+
325
+
326
+ <li class="public ">
327
+ <span class="summary_signature">
328
+
329
+ <a href="#strip-class_method" title="strip (class method)">.<strong>strip</strong>(raw_text) &#x21d2; String<sup>?</sup> </a>
330
+
331
+
332
+
333
+ </span>
334
+
335
+
336
+
337
+
338
+
339
+
340
+
341
+
342
+
343
+ <span class="summary_desc"><div class='inline'>
344
+ <p>Strip a value.</p>
345
+ </div></span>
346
+
347
+ </li>
348
+
349
+
350
+ <li class="public ">
351
+ <span class="summary_signature">
352
+
353
+ <a href="#translate_label_to_key-class_method" title="translate_label_to_key (class method)">.<strong>translate_label_to_key</strong>(element, label_map) &#x21d2; Symbol, String </a>
354
+
355
+
356
+
357
+ </span>
358
+
359
+
360
+
361
+
362
+
363
+
364
+
365
+
366
+
367
+ <span class="summary_desc"><div class='inline'>
368
+ <p>Extract column label and translate it into a frienly key.</p>
369
+ </div></span>
370
+
371
+ </li>
372
+
373
+
374
+ </ul>
375
+
376
+
377
+
378
+
379
+ <div id="class_method_details" class="method_details_list">
380
+ <h2>Class Method Details</h2>
381
+
382
+
383
+ <div class="method_details first">
384
+ <h3 class="signature first" id="decode_html-class_method">
385
+
386
+ .<strong>decode_html</strong>(text) &#x21d2; <tt>String</tt>
387
+
388
+
389
+
390
+
391
+
392
+ </h3><div class="docstring">
393
+ <div class="discussion">
394
+
395
+ <p>Decode HTML entities from text .</p>
396
+
397
+
398
+ </div>
399
+ </div>
400
+ <div class="tags">
401
+ <p class="tag_title">Parameters:</p>
402
+ <ul class="param">
403
+
404
+ <li>
405
+
406
+ <span class='name'>text</span>
407
+
408
+
409
+ <span class='type'>(<tt>String</tt>)</span>
410
+
411
+
412
+
413
+ &mdash;
414
+ <div class='inline'>
415
+ <p>Text to decode.</p>
416
+ </div>
417
+
418
+ </li>
419
+
420
+ </ul>
421
+
422
+ <p class="tag_title">Returns:</p>
423
+ <ul class="return">
424
+
425
+ <li>
426
+
427
+
428
+ <span class='type'>(<tt>String</tt>)</span>
429
+
430
+
431
+
432
+ </li>
433
+
434
+ </ul>
435
+
436
+ </div><table class="source_code">
437
+ <tr>
438
+ <td>
439
+ <pre class="lines">
440
+
441
+
442
+ 33
443
+ 34
444
+ 35</pre>
445
+ </td>
446
+ <td>
447
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 33</span>
448
+
449
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_decode_html'>decode_html</span> <span class='id identifier rubyid_text'>text</span>
450
+ <span class='const'>CGI</span><span class='period'>.</span><span class='id identifier rubyid_unescapeHTML'>unescapeHTML</span> <span class='id identifier rubyid_text'>text</span>
451
+ <span class='kw'>end</span></pre>
452
+ </td>
453
+ </tr>
454
+ </table>
455
+ </div>
456
+
457
+ <div class="method_details ">
458
+ <h3 class="signature " id="default_parser-class_method">
459
+
460
+ .<strong>default_parser</strong>(cell_element, data, key) &#x21d2; <tt>Object</tt>
461
+
462
+
463
+
464
+
465
+
466
+ </h3><div class="docstring">
467
+ <div class="discussion">
468
+
469
+ <p>Default cell content parser used to parse cell element.</p>
470
+
471
+
472
+ </div>
473
+ </div>
474
+ <div class="tags">
475
+ <p class="tag_title">Parameters:</p>
476
+ <ul class="param">
477
+
478
+ <li>
479
+
480
+ <span class='name'>cell_element</span>
481
+
482
+
483
+ <span class='type'>(<tt>Nokogiri::Element</tt>)</span>
484
+
485
+
486
+
487
+ &mdash;
488
+ <div class='inline'>
489
+ <p>Cell element to parse.</p>
490
+ </div>
491
+
492
+ </li>
493
+
494
+ <li>
495
+
496
+ <span class='name'>data</span>
497
+
498
+
499
+ <span class='type'>(<tt>Hash</tt>)</span>
500
+
501
+
502
+
503
+ &mdash;
504
+ <div class='inline'>
505
+ <p>Data hash to save parsed data into.</p>
506
+ </div>
507
+
508
+ </li>
509
+
510
+ <li>
511
+
512
+ <span class='name'>key</span>
513
+
514
+
515
+ <span class='type'>(<tt>String</tt>, <tt>Symbol</tt>)</span>
516
+
517
+
518
+
519
+ &mdash;
520
+ <div class='inline'>
521
+ <p>Header column key being parsed.</p>
522
+ </div>
523
+
524
+ </li>
525
+
526
+ </ul>
527
+
528
+
529
+ </div><table class="source_code">
530
+ <tr>
531
+ <td>
532
+ <pre class="lines">
533
+
534
+
535
+ 60
536
+ 61
537
+ 62
538
+ 63</pre>
539
+ </td>
540
+ <td>
541
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 60</span>
542
+
543
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_default_parser'>default_parser</span> <span class='id identifier rubyid_cell_element'>cell_element</span><span class='comma'>,</span> <span class='id identifier rubyid_data'>data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span>
544
+ <span class='id identifier rubyid_cell_element'>cell_element</span><span class='op'>&amp;.</span><span class='id identifier rubyid_search'>search</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>//i</span><span class='tstring_end'>&#39;</span></span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_remove'>remove</span>
545
+ <span class='id identifier rubyid_row_data'>row_data</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span> <span class='op'>=</span> <span class='id identifier rubyid_strip'>strip</span> <span class='id identifier rubyid_cell_element'>cell_element</span><span class='op'>&amp;.</span><span class='id identifier rubyid_text'>text</span>
546
+ <span class='kw'>end</span></pre>
547
+ </td>
548
+ </tr>
549
+ </table>
550
+ </div>
551
+
552
+ <div class="method_details ">
553
+ <h3 class="signature " id="encode_html-class_method">
554
+
555
+ .<strong>encode_html</strong>(text) &#x21d2; <tt>String</tt>
556
+
557
+
558
+
559
+
560
+
561
+ </h3><div class="docstring">
562
+ <div class="discussion">
563
+
564
+ <p>Encode text for valid HTML entities.</p>
565
+
566
+
567
+ </div>
568
+ </div>
569
+ <div class="tags">
570
+ <p class="tag_title">Parameters:</p>
571
+ <ul class="param">
572
+
573
+ <li>
574
+
575
+ <span class='name'>text</span>
576
+
577
+
578
+ <span class='type'>(<tt>String</tt>)</span>
579
+
580
+
581
+
582
+ &mdash;
583
+ <div class='inline'>
584
+ <p>Text to encode.</p>
585
+ </div>
586
+
587
+ </li>
588
+
589
+ </ul>
590
+
591
+ <p class="tag_title">Returns:</p>
592
+ <ul class="return">
593
+
594
+ <li>
595
+
596
+
597
+ <span class='type'>(<tt>String</tt>)</span>
598
+
599
+
600
+
601
+ </li>
602
+
603
+ </ul>
604
+
605
+ </div><table class="source_code">
606
+ <tr>
607
+ <td>
608
+ <pre class="lines">
609
+
610
+
611
+ 24
612
+ 25
613
+ 26</pre>
614
+ </td>
615
+ <td>
616
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 24</span>
617
+
618
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_encode_html'>encode_html</span> <span class='id identifier rubyid_text'>text</span>
619
+ <span class='const'>CGI</span><span class='period'>.</span><span class='id identifier rubyid_escapeHTML'>escapeHTML</span> <span class='id identifier rubyid_text'>text</span>
620
+ <span class='kw'>end</span></pre>
621
+ </td>
622
+ </tr>
623
+ </table>
624
+ </div>
625
+
626
+ <div class="method_details ">
627
+ <h3 class="signature " id="hash-class_method">
628
+
629
+ .<strong>hash</strong>(object) &#x21d2; <tt>String</tt>
630
+
631
+
632
+
633
+
634
+
635
+ </h3><div class="docstring">
636
+ <div class="discussion">
637
+
638
+ <p>Create a hash from object</p>
639
+
640
+
641
+ </div>
642
+ </div>
643
+ <div class="tags">
644
+ <p class="tag_title">Parameters:</p>
645
+ <ul class="param">
646
+
647
+ <li>
648
+
649
+ <span class='name'>object</span>
650
+
651
+
652
+ <span class='type'>(<tt>String</tt>, <tt>Hash</tt>, <tt>Object</tt>)</span>
653
+
654
+
655
+
656
+ &mdash;
657
+ <div class='inline'>
658
+ <p>Object to create hash from.</p>
659
+ </div>
660
+
661
+ </li>
662
+
663
+ </ul>
664
+
665
+ <p class="tag_title">Returns:</p>
666
+ <ul class="return">
667
+
668
+ <li>
669
+
670
+
671
+ <span class='type'>(<tt>String</tt>)</span>
672
+
673
+
674
+
675
+ </li>
676
+
677
+ </ul>
678
+
679
+ </div><table class="source_code">
680
+ <tr>
681
+ <td>
682
+ <pre class="lines">
683
+
684
+
685
+ 14
686
+ 15
687
+ 16
688
+ 17</pre>
689
+ </td>
690
+ <td>
691
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 14</span>
692
+
693
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_hash'>hash</span> <span class='id identifier rubyid_object'>object</span>
694
+ <span class='id identifier rubyid_object'>object</span> <span class='op'>=</span> <span class='id identifier rubyid_object'>object</span><span class='period'>.</span><span class='id identifier rubyid_hash'>hash</span> <span class='kw'>if</span> <span class='id identifier rubyid_object'>object</span><span class='period'>.</span><span class='id identifier rubyid_is_a?'>is_a?</span> <span class='const'>Hash</span>
695
+ <span class='const'>Digest</span><span class='op'>::</span><span class='const'>SHA1</span><span class='period'>.</span><span class='id identifier rubyid_hexdigest'>hexdigest</span> <span class='id identifier rubyid_object'>object</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
696
+ <span class='kw'>end</span></pre>
697
+ </td>
698
+ </tr>
699
+ </table>
700
+ </div>
701
+
702
+ <div class="method_details ">
703
+ <h3 class="signature " id="parse_content-class_method">
704
+
705
+ .<strong>parse_content</strong>(opts) {|data, row, header_map| ... } &#x21d2; <tt>Array&lt;Hash&gt;</tt><sup>?</sup>
706
+
707
+
708
+
709
+
710
+
711
+ </h3><div class="docstring">
712
+ <div class="discussion">
713
+
714
+ <p>Parse row data matching a selector using a header map to translate</p>
715
+
716
+ <pre class="code ruby"><code class="ruby">between columns and friendly keys.
717
+ </code></pre>
718
+
719
+
720
+ </div>
721
+ </div>
722
+ <div class="tags">
723
+ <p class="tag_title">Parameters:</p>
724
+ <ul class="param">
725
+
726
+ <li>
727
+
728
+ <span class='name'>opts</span>
729
+
730
+
731
+ <span class='type'>(<tt>Hash</tt>)</span>
732
+
733
+
734
+
735
+ &mdash;
736
+ <div class='inline'>
737
+ <p>({}) Configuration options.</p>
738
+ </div>
739
+
740
+ </li>
741
+
742
+ </ul>
743
+
744
+
745
+
746
+
747
+ <p class="tag_title">Options Hash (<tt>opts</tt>):</p>
748
+ <ul class="option">
749
+
750
+ <li>
751
+ <span class="name">:html</span>
752
+ <span class="type">(<tt>Nokogiri::Element</tt>)</span>
753
+ <span class="default">
754
+
755
+ </span>
756
+
757
+ &mdash; <div class='inline'>
758
+ <p>Container element to search into.</p>
759
+ </div>
760
+
761
+ </li>
762
+
763
+ <li>
764
+ <span class="name">:selector</span>
765
+ <span class="type">(<tt>String</tt>)</span>
766
+ <span class="default">
767
+
768
+ </span>
769
+
770
+ &mdash; <div class='inline'>
771
+ <p>CSS selector to match content cells.</p>
772
+ </div>
773
+
774
+ </li>
775
+
776
+ <li>
777
+ <span class="name">:first_row_header</span>
778
+ <span class="type">(<tt>Boolean</tt>)</span>
779
+ <span class="default">
780
+
781
+ &mdash; default:
782
+ <tt>false</tt>
783
+
784
+ </span>
785
+
786
+ &mdash; <div class='inline'>
787
+ <p>If true then first matching element will be assumed to be header and
788
+ ignored.</p>
789
+ </div>
790
+
791
+ </li>
792
+
793
+ <li>
794
+ <span class="name">:header_map</span>
795
+ <span class="type">(<tt>Hash{Symbol,String =&gt; Integer}</tt>)</span>
796
+ <span class="default">
797
+
798
+ </span>
799
+
800
+ &mdash; <div class='inline'>
801
+ <p>Header key vs index dictionary.</p>
802
+ </div>
803
+
804
+ </li>
805
+
806
+ <li>
807
+ <span class="name">:column_parsers</span>
808
+ <span class="type">(<tt>Hash{Symbol,String =&gt; lambda,proc}</tt>)</span>
809
+ <span class="default">
810
+
811
+ &mdash; default:
812
+ <tt>{}</tt>
813
+
814
+ </span>
815
+
816
+ &mdash; <div class='inline'>
817
+ <p>Custom column parsers for advance data extraction.</p>
818
+ </div>
819
+
820
+ </li>
821
+
822
+ </ul>
823
+
824
+
825
+ <p class="tag_title">Yield Parameters:</p>
826
+ <ul class="yieldparam">
827
+
828
+ <li>
829
+
830
+ <span class='name'>data</span>
831
+
832
+
833
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Object}</tt>)</span>
834
+
835
+
836
+
837
+ &mdash;
838
+ <div class='inline'>
839
+ <p>Parsed row data.</p>
840
+ </div>
841
+
842
+ </li>
843
+
844
+ <li>
845
+
846
+ <span class='name'>row</span>
847
+
848
+
849
+ <span class='type'>(<tt>Array</tt>)</span>
850
+
851
+
852
+
853
+ &mdash;
854
+ <div class='inline'>
855
+ <p>Raw row data.</p>
856
+ </div>
857
+
858
+ </li>
859
+
860
+ <li>
861
+
862
+ <span class='name'>header_map</span>
863
+
864
+
865
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Integer}</tt>)</span>
866
+
867
+
868
+
869
+ &mdash;
870
+ <div class='inline'>
871
+ <p>Header map used.</p>
872
+ </div>
873
+
874
+ </li>
875
+
876
+ </ul>
877
+ <p class="tag_title">Yield Returns:</p>
878
+ <ul class="yieldreturn">
879
+
880
+ <li>
881
+
882
+
883
+ <span class='type'>(<tt>Boolean</tt>)</span>
884
+
885
+
886
+
887
+ &mdash;
888
+ <div class='inline'>
889
+ <p>`true` when valid, else `false`.</p>
890
+ </div>
891
+
892
+ </li>
893
+
894
+ </ul>
895
+ <p class="tag_title">Returns:</p>
896
+ <ul class="return">
897
+
898
+ <li>
899
+
900
+
901
+ <span class='type'>(<tt>Array&lt;Hash&gt;</tt>, <tt>nil</tt>)</span>
902
+
903
+
904
+
905
+ &mdash;
906
+ <div class='inline'>
907
+ <p>Parsed rows data.</p>
908
+ </div>
909
+
910
+ </li>
911
+
912
+ </ul>
913
+
914
+ </div><table class="source_code">
915
+ <tr>
916
+ <td>
917
+ <pre class="lines">
918
+
919
+
920
+ 84
921
+ 85
922
+ 86
923
+ 87
924
+ 88
925
+ 89
926
+ 90
927
+ 91
928
+ 92
929
+ 93
930
+ 94
931
+ 95
932
+ 96
933
+ 97
934
+ 98
935
+ 99
936
+ 100
937
+ 101
938
+ 102
939
+ 103
940
+ 104
941
+ 105
942
+ 106
943
+ 107
944
+ 108
945
+ 109
946
+ 110
947
+ 111
948
+ 112
949
+ 113
950
+ 114
951
+ 115
952
+ 116
953
+ 117
954
+ 118
955
+ 119
956
+ 120
957
+ 121
958
+ 122</pre>
959
+ </td>
960
+ <td>
961
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 84</span>
962
+
963
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_content'>parse_content</span> <span class='id identifier rubyid_opts'>opts</span><span class='comma'>,</span> <span class='op'>&amp;</span><span class='id identifier rubyid_filter'>filter</span>
964
+ <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
965
+ <span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
966
+ <span class='label'>selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
967
+ <span class='label'>first_row_header:</span> <span class='kw'>false</span><span class='comma'>,</span>
968
+ <span class='label'>header_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
969
+ <span class='label'>column_parsers:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
970
+ <span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
971
+
972
+ <span class='comment'># Setup config
973
+ </span> <span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='lbracket'>[</span><span class='rbracket'>]</span>
974
+ <span class='id identifier rubyid_row_data'>row_data</span> <span class='op'>=</span> <span class='id identifier rubyid_child_element'>child_element</span> <span class='op'>=</span> <span class='kw'>nil</span>
975
+ <span class='id identifier rubyid_first'>first</span> <span class='op'>=</span> <span class='id identifier rubyid_first_row_header'>first_row_header</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span>
976
+ <span class='id identifier rubyid_header_map'>header_map</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_map</span><span class='rbracket'>]</span>
977
+ <span class='id identifier rubyid_column_parsers'>column_parsers</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_parsers</span><span class='rbracket'>]</span>
978
+
979
+ <span class='comment'># Get and parse rows
980
+ </span> <span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:selector</span><span class='rbracket'>]</span><span class='rparen'>)</span>
981
+ <span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_row'>row</span><span class='op'>|</span>
982
+ <span class='comment'># First row header validation
983
+ </span> <span class='kw'>if</span> <span class='id identifier rubyid_first'>first</span> <span class='op'>&amp;&amp;</span> <span class='id identifier rubyid_first_row_header'>first_row_header</span>
984
+ <span class='id identifier rubyid_first'>first</span> <span class='op'>=</span> <span class='kw'>false</span>
985
+ <span class='kw'>next</span>
986
+ <span class='kw'>end</span>
987
+
988
+ <span class='comment'># Extract content data
989
+ </span> <span class='id identifier rubyid_row_data'>row_data</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
990
+ <span class='id identifier rubyid_header_map'>header_map</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_key'>key</span><span class='comma'>,</span> <span class='id identifier rubyid_index'>index</span><span class='op'>|</span>
991
+ <span class='comment'># Parse column html with default or custom parser
992
+ </span> <span class='id identifier rubyid_child_element'>child_element</span> <span class='op'>=</span> <span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_children'>children</span><span class='lbracket'>[</span><span class='id identifier rubyid_index'>index</span><span class='rbracket'>]</span>
993
+ <span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>?</span>
994
+ <span class='id identifier rubyid_default_parser'>default_parser</span><span class='lparen'>(</span><span class='id identifier rubyid_child_element'>child_element</span><span class='comma'>,</span> <span class='id identifier rubyid_row_data'>row_data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span> <span class='op'>:</span>
995
+ <span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_call'>call</span><span class='lparen'>(</span><span class='id identifier rubyid_child_element'>child_element</span><span class='comma'>,</span> <span class='id identifier rubyid_row_data'>row_data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span>
996
+ <span class='kw'>end</span>
997
+ <span class='kw'>next</span> <span class='kw'>unless</span> <span class='id identifier rubyid_filter'>filter</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>||</span> <span class='id identifier rubyid_filter'>filter</span><span class='period'>.</span><span class='id identifier rubyid_call'>call</span><span class='lparen'>(</span><span class='id identifier rubyid_row_data'>row_data</span><span class='comma'>,</span> <span class='id identifier rubyid_row'>row</span><span class='comma'>,</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='rparen'>)</span>
998
+ <span class='id identifier rubyid_data'>data</span> <span class='op'>&lt;&lt;</span> <span class='id identifier rubyid_row_data'>row_data</span>
999
+ <span class='kw'>end</span>
1000
+ <span class='id identifier rubyid_data'>data</span>
1001
+ <span class='kw'>end</span></pre>
1002
+ </td>
1003
+ </tr>
1004
+ </table>
1005
+ </div>
1006
+
1007
+ <div class="method_details ">
1008
+ <h3 class="signature " id="parse_header_map-class_method">
1009
+
1010
+ .<strong>parse_header_map</strong>(opts = {}) &#x21d2; <tt>Hash{Symbol,String =&gt; Integer}</tt><sup>?</sup>
1011
+
1012
+
1013
+
1014
+
1015
+
1016
+ </h3><div class="docstring">
1017
+ <div class="discussion">
1018
+
1019
+ <p>Parse header from selector and create a header map to match a column key</p>
1020
+
1021
+ <pre class="code ruby"><code class="ruby">with column index.
1022
+ </code></pre>
1023
+
1024
+
1025
+ </div>
1026
+ </div>
1027
+ <div class="tags">
1028
+ <p class="tag_title">Parameters:</p>
1029
+ <ul class="param">
1030
+
1031
+ <li>
1032
+
1033
+ <span class='name'>opts</span>
1034
+
1035
+
1036
+ <span class='type'>(<tt>Hash</tt>)</span>
1037
+
1038
+
1039
+ <em class="default">(defaults to: <tt>{}</tt>)</em>
1040
+
1041
+
1042
+ &mdash;
1043
+ <div class='inline'>
1044
+ <p>({}) Configuration options.</p>
1045
+ </div>
1046
+
1047
+ </li>
1048
+
1049
+ </ul>
1050
+
1051
+
1052
+
1053
+
1054
+ <p class="tag_title">Options Hash (<tt>opts</tt>):</p>
1055
+ <ul class="option">
1056
+
1057
+ <li>
1058
+ <span class="name">:html</span>
1059
+ <span class="type">(<tt>Nokogiri::Element</tt>)</span>
1060
+ <span class="default">
1061
+
1062
+ </span>
1063
+
1064
+ &mdash; <div class='inline'>
1065
+ <p>Container element to search into.</p>
1066
+ </div>
1067
+
1068
+ </li>
1069
+
1070
+ <li>
1071
+ <span class="name">:selector</span>
1072
+ <span class="type">(<tt>String</tt>)</span>
1073
+ <span class="default">
1074
+
1075
+ </span>
1076
+
1077
+ &mdash; <div class='inline'>
1078
+ <p>CSS selector to match header cells.</p>
1079
+ </div>
1080
+
1081
+ </li>
1082
+
1083
+ <li>
1084
+ <span class="name">:column_key_label_map</span>
1085
+ <span class="type">(<tt>Hash{Symbol,String =&gt; Regex,String}</tt>)</span>
1086
+ <span class="default">
1087
+
1088
+ </span>
1089
+
1090
+ &mdash; <div class='inline'>
1091
+ <p>Key vs. label dictionary.</p>
1092
+ </div>
1093
+
1094
+ </li>
1095
+
1096
+ <li>
1097
+ <span class="name">:first_row_header</span>
1098
+ <span class="type">(<tt>Boolean</tt>)</span>
1099
+ <span class="default">
1100
+
1101
+ &mdash; default:
1102
+ <tt>false</tt>
1103
+
1104
+ </span>
1105
+
1106
+ &mdash; <div class='inline'>
1107
+ <p>If true then selector first matching row will be used as header for
1108
+ parsing.</p>
1109
+ </div>
1110
+
1111
+ </li>
1112
+
1113
+ </ul>
1114
+
1115
+
1116
+ <p class="tag_title">Returns:</p>
1117
+ <ul class="return">
1118
+
1119
+ <li>
1120
+
1121
+
1122
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Integer}</tt>, <tt>nil</tt>)</span>
1123
+
1124
+
1125
+
1126
+ &mdash;
1127
+ <div class='inline'>
1128
+ <p>Key vs. column index map.</p>
1129
+ </div>
1130
+
1131
+ </li>
1132
+
1133
+ </ul>
1134
+
1135
+ </div><table class="source_code">
1136
+ <tr>
1137
+ <td>
1138
+ <pre class="lines">
1139
+
1140
+
1141
+ 152
1142
+ 153
1143
+ 154
1144
+ 155
1145
+ 156
1146
+ 157
1147
+ 158
1148
+ 159
1149
+ 160
1150
+ 161
1151
+ 162
1152
+ 163
1153
+ 164
1154
+ 165
1155
+ 166
1156
+ 167
1157
+ 168
1158
+ 169
1159
+ 170
1160
+ 171
1161
+ 172
1162
+ 173
1163
+ 174
1164
+ 175
1165
+ 176
1166
+ 177
1167
+ 178
1168
+ 179
1169
+ 180</pre>
1170
+ </td>
1171
+ <td>
1172
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 152</span>
1173
+
1174
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_header_map'>parse_header_map</span> <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
1175
+ <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
1176
+ <span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1177
+ <span class='label'>selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1178
+ <span class='label'>column_key_label_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
1179
+ <span class='label'>first_row_header:</span> <span class='kw'>false</span>
1180
+ <span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
1181
+
1182
+ <span class='comment'># Setup config
1183
+ </span> <span class='id identifier rubyid_dictionary'>dictionary</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_key_label_map</span><span class='rbracket'>]</span>
1184
+ <span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='lbracket'>[</span><span class='rbracket'>]</span>
1185
+ <span class='id identifier rubyid_column_map'>column_map</span> <span class='op'>=</span> <span class='kw'>nil</span>
1186
+
1187
+ <span class='comment'># Extract and parse header rows
1188
+ </span> <span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:selector</span><span class='rbracket'>]</span><span class='rparen'>)</span> <span class='kw'>rescue</span> <span class='kw'>nil</span>
1189
+ <span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
1190
+ <span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='lbracket'>[</span><span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_first'>first</span><span class='rbracket'>]</span> <span class='kw'>if</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span>
1191
+ <span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_row'>row</span><span class='op'>|</span>
1192
+ <span class='id identifier rubyid_column_map'>column_map</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
1193
+ <span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_children'>children</span><span class='period'>.</span><span class='id identifier rubyid_each_with_index'>each_with_index</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_col'>col</span><span class='comma'>,</span> <span class='id identifier rubyid_index'>index</span><span class='op'>|</span>
1194
+ <span class='comment'># Parse and map column header
1195
+ </span> <span class='id identifier rubyid_column_key'>column_key</span> <span class='op'>=</span> <span class='id identifier rubyid_translate_label_to_key'>translate_label_to_key</span> <span class='id identifier rubyid_col'>col</span><span class='comma'>,</span> <span class='id identifier rubyid_dictionary'>dictionary</span>
1196
+ <span class='kw'>next</span> <span class='kw'>if</span> <span class='id identifier rubyid_column_key'>column_key</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
1197
+ <span class='id identifier rubyid_column_map'>column_map</span><span class='lbracket'>[</span><span class='id identifier rubyid_column_key'>column_key</span><span class='rbracket'>]</span> <span class='op'>=</span> <span class='id identifier rubyid_index'>index</span>
1198
+ <span class='kw'>end</span>
1199
+ <span class='id identifier rubyid_data'>data</span> <span class='op'>&lt;&lt;</span> <span class='id identifier rubyid_column_map'>column_map</span>
1200
+ <span class='kw'>end</span>
1201
+ <span class='id identifier rubyid_data'>data</span><span class='op'>&amp;.</span><span class='id identifier rubyid_first'>first</span>
1202
+ <span class='kw'>end</span></pre>
1203
+ </td>
1204
+ </tr>
1205
+ </table>
1206
+ </div>
1207
+
1208
+ <div class="method_details ">
1209
+ <h3 class="signature " id="parse_table-class_method">
1210
+
1211
+ .<strong>parse_table</strong>(opts = {}) {|data, row, header_map| ... } &#x21d2; <tt>Hash{Symbol =&gt; Array,Hash,nil}</tt>
1212
+
1213
+
1214
+
1215
+
1216
+
1217
+ </h3><div class="docstring">
1218
+ <div class="discussion">
1219
+
1220
+ <p>Parse data from a horizontal table like structure matching a selectors and</p>
1221
+
1222
+ <pre class="code ruby"><code class="ruby">using a header map to match columns.
1223
+ </code></pre>
1224
+
1225
+
1226
+ </div>
1227
+ </div>
1228
+ <div class="tags">
1229
+ <p class="tag_title">Parameters:</p>
1230
+ <ul class="param">
1231
+
1232
+ <li>
1233
+
1234
+ <span class='name'>opts</span>
1235
+
1236
+
1237
+ <span class='type'>(<tt>Hash</tt>)</span>
1238
+
1239
+
1240
+ <em class="default">(defaults to: <tt>{}</tt>)</em>
1241
+
1242
+
1243
+ &mdash;
1244
+ <div class='inline'>
1245
+ <p>({}) Configuration options.</p>
1246
+ </div>
1247
+
1248
+ </li>
1249
+
1250
+ </ul>
1251
+
1252
+
1253
+
1254
+
1255
+ <p class="tag_title">Options Hash (<tt>opts</tt>):</p>
1256
+ <ul class="option">
1257
+
1258
+ <li>
1259
+ <span class="name">:html</span>
1260
+ <span class="type">(<tt>Nokogiri::Element</tt>)</span>
1261
+ <span class="default">
1262
+
1263
+ </span>
1264
+
1265
+ &mdash; <div class='inline'>
1266
+ <p>Container element to search into.</p>
1267
+ </div>
1268
+
1269
+ </li>
1270
+
1271
+ <li>
1272
+ <span class="name">:header_selector</span>
1273
+ <span class="type">(<tt>String</tt>)</span>
1274
+ <span class="default">
1275
+
1276
+ </span>
1277
+
1278
+ &mdash; <div class='inline'>
1279
+ <p>Header column elements selector.</p>
1280
+ </div>
1281
+
1282
+ </li>
1283
+
1284
+ <li>
1285
+ <span class="name">:header_key_label_map</span>
1286
+ <span class="type">(<tt>Hash{Symbol,String =&gt; Regex,String}</tt>)</span>
1287
+ <span class="default">
1288
+
1289
+ </span>
1290
+
1291
+ &mdash; <div class='inline'>
1292
+ <p>Header key vs. label dictionary to match column indexes.</p>
1293
+ </div>
1294
+
1295
+ </li>
1296
+
1297
+ <li>
1298
+ <span class="name">:content_selector</span>
1299
+ <span class="type">(<tt>String</tt>)</span>
1300
+ <span class="default">
1301
+
1302
+ </span>
1303
+
1304
+ &mdash; <div class='inline'>
1305
+ <p>Content row elements selector.</p>
1306
+ </div>
1307
+
1308
+ </li>
1309
+
1310
+ <li>
1311
+ <span class="name">:first_row_header</span>
1312
+ <span class="type">(<tt>Boolean</tt>)</span>
1313
+ <span class="default">
1314
+
1315
+ &mdash; default:
1316
+ <tt>false</tt>
1317
+
1318
+ </span>
1319
+
1320
+ &mdash; <div class='inline'>
1321
+ <p>If true then selector first matching row will be used as header for
1322
+ parsing.</p>
1323
+ </div>
1324
+
1325
+ </li>
1326
+
1327
+ <li>
1328
+ <span class="name">:column_parsers</span>
1329
+ <span class="type">(<tt>Hash{Symbol,String =&gt; lambda,proc}</tt>)</span>
1330
+ <span class="default">
1331
+
1332
+ &mdash; default:
1333
+ <tt>{}</tt>
1334
+
1335
+ </span>
1336
+
1337
+ &mdash; <div class='inline'>
1338
+ <p>Custom column parsers for advance data extraction.</p>
1339
+ </div>
1340
+
1341
+ </li>
1342
+
1343
+ </ul>
1344
+
1345
+
1346
+ <p class="tag_title">Yield Parameters:</p>
1347
+ <ul class="yieldparam">
1348
+
1349
+ <li>
1350
+
1351
+ <span class='name'>data</span>
1352
+
1353
+
1354
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Object}</tt>)</span>
1355
+
1356
+
1357
+
1358
+ &mdash;
1359
+ <div class='inline'>
1360
+ <p>Parsed content row data.</p>
1361
+ </div>
1362
+
1363
+ </li>
1364
+
1365
+ <li>
1366
+
1367
+ <span class='name'>row</span>
1368
+
1369
+
1370
+ <span class='type'>(<tt>Array</tt>)</span>
1371
+
1372
+
1373
+
1374
+ &mdash;
1375
+ <div class='inline'>
1376
+ <p>Raw content row data.</p>
1377
+ </div>
1378
+
1379
+ </li>
1380
+
1381
+ <li>
1382
+
1383
+ <span class='name'>header_map</span>
1384
+
1385
+
1386
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Integer}</tt>)</span>
1387
+
1388
+
1389
+
1390
+ &mdash;
1391
+ <div class='inline'>
1392
+ <p>Header map used.</p>
1393
+ </div>
1394
+
1395
+ </li>
1396
+
1397
+ </ul>
1398
+ <p class="tag_title">Yield Returns:</p>
1399
+ <ul class="yieldreturn">
1400
+
1401
+ <li>
1402
+
1403
+
1404
+ <span class='type'>(<tt>Boolean</tt>)</span>
1405
+
1406
+
1407
+
1408
+ &mdash;
1409
+ <div class='inline'>
1410
+ <p>`true` when valid, else `false`.</p>
1411
+ </div>
1412
+
1413
+ </li>
1414
+
1415
+ </ul>
1416
+ <p class="tag_title">Returns:</p>
1417
+ <ul class="return">
1418
+
1419
+ <li>
1420
+
1421
+
1422
+ <span class='type'>(<tt>Hash{Symbol =&gt; Array,Hash,nil}</tt>)</span>
1423
+
1424
+
1425
+
1426
+ &mdash;
1427
+ <div class='inline'>
1428
+ <p>Hash data is as follows:</p>
1429
+ <ul><li>
1430
+ <p>`[Hash] :header_map` Header map used.</p>
1431
+ </li><li>
1432
+ <p>`[Array&lt;Hash&gt;,nil] :data` Parsed rows data.</p>
1433
+ </li></ul>
1434
+ </div>
1435
+
1436
+ </li>
1437
+
1438
+ </ul>
1439
+
1440
+ </div><table class="source_code">
1441
+ <tr>
1442
+ <td>
1443
+ <pre class="lines">
1444
+
1445
+
1446
+ 204
1447
+ 205
1448
+ 206
1449
+ 207
1450
+ 208
1451
+ 209
1452
+ 210
1453
+ 211
1454
+ 212
1455
+ 213
1456
+ 214
1457
+ 215
1458
+ 216
1459
+ 217
1460
+ 218
1461
+ 219
1462
+ 220
1463
+ 221
1464
+ 222
1465
+ 223
1466
+ 224
1467
+ 225
1468
+ 226</pre>
1469
+ </td>
1470
+ <td>
1471
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 204</span>
1472
+
1473
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_table'>parse_table</span> <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span> <span class='op'>&amp;</span><span class='id identifier rubyid_filter'>filter</span>
1474
+ <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
1475
+ <span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1476
+ <span class='label'>header_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1477
+ <span class='label'>header_key_label_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
1478
+ <span class='label'>content_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1479
+ <span class='label'>first_row_header:</span> <span class='kw'>false</span><span class='comma'>,</span>
1480
+ <span class='label'>column_parsers:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
1481
+ <span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
1482
+ <span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
1483
+ <span class='id identifier rubyid_header_map'>header_map</span> <span class='op'>=</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_header_map'>parse_header_map</span> <span class='label'>html:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='comma'>,</span>
1484
+ <span class='label'>selector:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_selector</span><span class='rbracket'>]</span><span class='comma'>,</span>
1485
+ <span class='label'>column_key_label_map:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_key_label_map</span><span class='rbracket'>]</span><span class='comma'>,</span>
1486
+ <span class='label'>first_row_header:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span>
1487
+ <span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
1488
+ <span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_content'>parse_content</span> <span class='label'>html:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='comma'>,</span>
1489
+ <span class='label'>selector:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:content_selector</span><span class='rbracket'>]</span><span class='comma'>,</span>
1490
+ <span class='label'>header_map:</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='comma'>,</span>
1491
+ <span class='label'>first_row_header:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span><span class='comma'>,</span>
1492
+ <span class='label'>column_parsers:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_parsers</span><span class='rbracket'>]</span><span class='comma'>,</span>
1493
+ <span class='op'>&amp;</span><span class='id identifier rubyid_filter'>filter</span>
1494
+ <span class='lbrace'>{</span><span class='label'>header_map:</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='comma'>,</span> <span class='label'>data:</span> <span class='id identifier rubyid_data'>data</span><span class='rbrace'>}</span>
1495
+ <span class='kw'>end</span></pre>
1496
+ </td>
1497
+ </tr>
1498
+ </table>
1499
+ </div>
1500
+
1501
+ <div class="method_details ">
1502
+ <h3 class="signature " id="parse_vertical_table-class_method">
1503
+
1504
+ .<strong>parse_vertical_table</strong>(opts = {}) {|data, row, header_map| ... } &#x21d2; <tt>Hash{Symbol =&gt; Array,Hash,nil}</tt>
1505
+
1506
+
1507
+
1508
+
1509
+
1510
+ </h3><div class="docstring">
1511
+ <div class="discussion">
1512
+
1513
+ <p>Parse data from a vertical table like structure matching a selectors and</p>
1514
+
1515
+ <pre class="code ruby"><code class="ruby">using a header map to match columns.
1516
+ </code></pre>
1517
+
1518
+
1519
+ </div>
1520
+ </div>
1521
+ <div class="tags">
1522
+ <p class="tag_title">Parameters:</p>
1523
+ <ul class="param">
1524
+
1525
+ <li>
1526
+
1527
+ <span class='name'>opts</span>
1528
+
1529
+
1530
+ <span class='type'>(<tt>Hash</tt>)</span>
1531
+
1532
+
1533
+ <em class="default">(defaults to: <tt>{}</tt>)</em>
1534
+
1535
+
1536
+ &mdash;
1537
+ <div class='inline'>
1538
+ <p>({}) Configuration options.</p>
1539
+ </div>
1540
+
1541
+ </li>
1542
+
1543
+ </ul>
1544
+
1545
+
1546
+
1547
+
1548
+ <p class="tag_title">Options Hash (<tt>opts</tt>):</p>
1549
+ <ul class="option">
1550
+
1551
+ <li>
1552
+ <span class="name">:html</span>
1553
+ <span class="type">(<tt>Nokogiri::Element</tt>)</span>
1554
+ <span class="default">
1555
+
1556
+ </span>
1557
+
1558
+ &mdash; <div class='inline'>
1559
+ <p>Container element to search into.</p>
1560
+ </div>
1561
+
1562
+ </li>
1563
+
1564
+ <li>
1565
+ <span class="name">:row_selector</span>
1566
+ <span class="type">(<tt>String</tt>)</span>
1567
+ <span class="default">
1568
+
1569
+ </span>
1570
+
1571
+ &mdash; <div class='inline'>
1572
+ <p>Vertical row like elements selector.</p>
1573
+ </div>
1574
+
1575
+ </li>
1576
+
1577
+ <li>
1578
+ <span class="name">:header_selector</span>
1579
+ <span class="type">(<tt>String</tt>)</span>
1580
+ <span class="default">
1581
+
1582
+ </span>
1583
+
1584
+ &mdash; <div class='inline'>
1585
+ <p>Header column elements selector.</p>
1586
+ </div>
1587
+
1588
+ </li>
1589
+
1590
+ <li>
1591
+ <span class="name">:header_key_label_map</span>
1592
+ <span class="type">(<tt>Hash{Symbol,String =&gt; Regex,String}</tt>)</span>
1593
+ <span class="default">
1594
+
1595
+ </span>
1596
+
1597
+ &mdash; <div class='inline'>
1598
+ <p>Header key vs. label dictionary to match column indexes.</p>
1599
+ </div>
1600
+
1601
+ </li>
1602
+
1603
+ <li>
1604
+ <span class="name">:content_selector</span>
1605
+ <span class="type">(<tt>String</tt>)</span>
1606
+ <span class="default">
1607
+
1608
+ </span>
1609
+
1610
+ &mdash; <div class='inline'>
1611
+ <p>Content row elements selector.</p>
1612
+ </div>
1613
+
1614
+ </li>
1615
+
1616
+ <li>
1617
+ <span class="name">:column_parsers</span>
1618
+ <span class="type">(<tt>Hash{Symbol,String =&gt; lambda,proc}</tt>)</span>
1619
+ <span class="default">
1620
+
1621
+ &mdash; default:
1622
+ <tt>{}</tt>
1623
+
1624
+ </span>
1625
+
1626
+ &mdash; <div class='inline'>
1627
+ <p>Custom column parsers for advance data extraction.</p>
1628
+ </div>
1629
+
1630
+ </li>
1631
+
1632
+ </ul>
1633
+
1634
+
1635
+ <p class="tag_title">Yield Parameters:</p>
1636
+ <ul class="yieldparam">
1637
+
1638
+ <li>
1639
+
1640
+ <span class='name'>data</span>
1641
+
1642
+
1643
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Object}</tt>)</span>
1644
+
1645
+
1646
+
1647
+ &mdash;
1648
+ <div class='inline'>
1649
+ <p>Parsed content row data.</p>
1650
+ </div>
1651
+
1652
+ </li>
1653
+
1654
+ <li>
1655
+
1656
+ <span class='name'>row</span>
1657
+
1658
+
1659
+ <span class='type'>(<tt>Array</tt>)</span>
1660
+
1661
+
1662
+
1663
+ &mdash;
1664
+ <div class='inline'>
1665
+ <p>Raw content row data.</p>
1666
+ </div>
1667
+
1668
+ </li>
1669
+
1670
+ <li>
1671
+
1672
+ <span class='name'>header_map</span>
1673
+
1674
+
1675
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Integer}</tt>)</span>
1676
+
1677
+
1678
+
1679
+ &mdash;
1680
+ <div class='inline'>
1681
+ <p>Header map used.</p>
1682
+ </div>
1683
+
1684
+ </li>
1685
+
1686
+ </ul>
1687
+ <p class="tag_title">Yield Returns:</p>
1688
+ <ul class="yieldreturn">
1689
+
1690
+ <li>
1691
+
1692
+
1693
+ <span class='type'>(<tt>Boolean</tt>)</span>
1694
+
1695
+
1696
+
1697
+ &mdash;
1698
+ <div class='inline'>
1699
+ <p>`true` when valid, else `false`.</p>
1700
+ </div>
1701
+
1702
+ </li>
1703
+
1704
+ </ul>
1705
+ <p class="tag_title">Returns:</p>
1706
+ <ul class="return">
1707
+
1708
+ <li>
1709
+
1710
+
1711
+ <span class='type'>(<tt>Hash{Symbol =&gt; Array,Hash,nil}</tt>)</span>
1712
+
1713
+
1714
+
1715
+ &mdash;
1716
+ <div class='inline'>
1717
+ <p>Hash data is as follows:</p>
1718
+ <ul><li>
1719
+ <p>`[Hash] :header_map` Header map used.</p>
1720
+ </li><li>
1721
+ <p>`[Array&lt;Hash&gt;,nil] :data` Parsed rows data.</p>
1722
+ </li></ul>
1723
+ </div>
1724
+
1725
+ </li>
1726
+
1727
+ </ul>
1728
+
1729
+ </div><table class="source_code">
1730
+ <tr>
1731
+ <td>
1732
+ <pre class="lines">
1733
+
1734
+
1735
+ 249
1736
+ 250
1737
+ 251
1738
+ 252
1739
+ 253
1740
+ 254
1741
+ 255
1742
+ 256
1743
+ 257
1744
+ 258
1745
+ 259
1746
+ 260
1747
+ 261
1748
+ 262
1749
+ 263
1750
+ 264
1751
+ 265
1752
+ 266
1753
+ 267
1754
+ 268
1755
+ 269
1756
+ 270
1757
+ 271
1758
+ 272
1759
+ 273
1760
+ 274
1761
+ 275
1762
+ 276
1763
+ 277
1764
+ 278
1765
+ 279
1766
+ 280
1767
+ 281</pre>
1768
+ </td>
1769
+ <td>
1770
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 249</span>
1771
+
1772
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_vertical_table'>parse_vertical_table</span> <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span> <span class='op'>&amp;</span><span class='id identifier rubyid_filter'>filter</span>
1773
+ <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
1774
+ <span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1775
+ <span class='label'>row_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1776
+ <span class='label'>header_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1777
+ <span class='label'>header_key_label_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
1778
+ <span class='label'>content_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
1779
+ <span class='label'>column_parsers:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
1780
+ <span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
1781
+ <span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
1782
+
1783
+ <span class='comment'># Setup config
1784
+ </span> <span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
1785
+ <span class='id identifier rubyid_dictionary'>dictionary</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_key_label_map</span><span class='rbracket'>]</span>
1786
+ <span class='id identifier rubyid_column_parsers'>column_parsers</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_parsers</span><span class='rbracket'>]</span>
1787
+
1788
+ <span class='comment'># Extract headers and content
1789
+ </span> <span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:row_selector</span><span class='rbracket'>]</span><span class='rparen'>)</span> <span class='kw'>rescue</span> <span class='kw'>nil</span>
1790
+ <span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
1791
+ <span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_row'>row</span><span class='op'>|</span>
1792
+ <span class='comment'># Parse and map column header
1793
+ </span> <span class='id identifier rubyid_header_element'>header_element</span> <span class='op'>=</span> <span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_selector</span><span class='rbracket'>]</span><span class='rparen'>)</span>
1794
+ <span class='id identifier rubyid_key'>key</span> <span class='op'>=</span> <span class='id identifier rubyid_translate_label_to_key'>translate_label_to_key</span> <span class='id identifier rubyid_header_element'>header_element</span><span class='comma'>,</span> <span class='id identifier rubyid_dictionary'>dictionary</span>
1795
+ <span class='kw'>next</span> <span class='kw'>if</span> <span class='id identifier rubyid_key'>key</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>||</span> <span class='id identifier rubyid_key'>key</span> <span class='op'>==</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_end'>&#39;</span></span>
1796
+
1797
+ <span class='comment'># Parse column html with default or custom parser
1798
+ </span> <span class='id identifier rubyid_content_element'>content_element</span> <span class='op'>=</span> <span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:content_selector</span><span class='rbracket'>]</span><span class='rparen'>)</span>
1799
+ <span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>?</span>
1800
+ <span class='id identifier rubyid_default_parser'>default_parser</span><span class='lparen'>(</span><span class='id identifier rubyid_content_element'>content_element</span><span class='comma'>,</span> <span class='id identifier rubyid_data'>data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span> <span class='op'>:</span>
1801
+ <span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_call'>call</span><span class='lparen'>(</span><span class='id identifier rubyid_content_element'>content_element</span><span class='comma'>,</span> <span class='id identifier rubyid_data'>data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span>
1802
+ <span class='kw'>end</span>
1803
+ <span class='id identifier rubyid_data'>data</span>
1804
+ <span class='kw'>end</span></pre>
1805
+ </td>
1806
+ </tr>
1807
+ </table>
1808
+ </div>
1809
+
1810
+ <div class="method_details ">
1811
+ <h3 class="signature " id="strip-class_method">
1812
+
1813
+ .<strong>strip</strong>(raw_text) &#x21d2; <tt>String</tt><sup>?</sup>
1814
+
1815
+
1816
+
1817
+
1818
+
1819
+ </h3><div class="docstring">
1820
+ <div class="discussion">
1821
+
1822
+ <p>Strip a value.</p>
1823
+
1824
+
1825
+ </div>
1826
+ </div>
1827
+ <div class="tags">
1828
+ <p class="tag_title">Parameters:</p>
1829
+ <ul class="param">
1830
+
1831
+ <li>
1832
+
1833
+ <span class='name'>raw_text</span>
1834
+
1835
+
1836
+ <span class='type'>(<tt>String</tt>, <tt>Object</tt>, <tt>nil</tt>)</span>
1837
+
1838
+
1839
+
1840
+ &mdash;
1841
+ <div class='inline'>
1842
+ <p>Text to strip.</p>
1843
+ </div>
1844
+
1845
+ </li>
1846
+
1847
+ </ul>
1848
+
1849
+ <p class="tag_title">Returns:</p>
1850
+ <ul class="return">
1851
+
1852
+ <li>
1853
+
1854
+
1855
+ <span class='type'>(<tt>String</tt>, <tt>nil</tt>)</span>
1856
+
1857
+
1858
+
1859
+ &mdash;
1860
+ <div class='inline'>
1861
+ <p>`nil` when <code>raw_text</code> is nil, else `String`.</p>
1862
+ </div>
1863
+
1864
+ </li>
1865
+
1866
+ </ul>
1867
+
1868
+ </div><table class="source_code">
1869
+ <tr>
1870
+ <td>
1871
+ <pre class="lines">
1872
+
1873
+
1874
+ 42
1875
+ 43
1876
+ 44
1877
+ 45
1878
+ 46
1879
+ 47
1880
+ 48
1881
+ 49
1882
+ 50
1883
+ 51
1884
+ 52
1885
+ 53</pre>
1886
+ </td>
1887
+ <td>
1888
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 42</span>
1889
+
1890
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_strip'>strip</span> <span class='id identifier rubyid_raw_text'>raw_text</span>
1891
+ <span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
1892
+ <span class='id identifier rubyid_raw_text'>raw_text</span> <span class='op'>=</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span> <span class='kw'>unless</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_is_a?'>is_a?</span> <span class='const'>String</span>
1893
+ <span class='id identifier rubyid_regex'>regex</span> <span class='op'>=</span> <span class='tstring'><span class='regexp_beg'>/</span><span class='tstring_content'>(\s|\u3000|\u00a0)+</span><span class='regexp_end'>/</span></span>
1894
+ <span class='id identifier rubyid_good_encoding'>good_encoding</span> <span class='op'>=</span> <span class='lparen'>(</span><span class='id identifier rubyid_raw_text'>raw_text</span> <span class='op'>=~</span> <span class='tstring'><span class='regexp_beg'>/</span><span class='tstring_content'>\u3000</span><span class='regexp_end'>/</span></span> <span class='op'>||</span> <span class='kw'>true</span><span class='rparen'>)</span> <span class='kw'>rescue</span> <span class='kw'>false</span>
1895
+ <span class='kw'>unless</span> <span class='id identifier rubyid_good_encoding'>good_encoding</span>
1896
+ <span class='id identifier rubyid_raw_text'>raw_text</span> <span class='op'>=</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_force_encoding'>force_encoding</span><span class='lparen'>(</span><span class='gvar'>$APP_CONFIG</span><span class='lbracket'>[</span><span class='symbol'>:encoding</span><span class='rbracket'>]</span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_encode'>encode</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>UTF-8</span><span class='tstring_end'>&#39;</span></span><span class='rparen'>)</span>
1897
+ <span class='id identifier rubyid_regex'>regex</span> <span class='op'>=</span> <span class='tstring'><span class='regexp_beg'>/</span><span class='tstring_content'>(\s|\u3000|\u00a0|\u00c2\u00a0)+</span><span class='regexp_end'>/</span></span>
1898
+ <span class='kw'>end</span>
1899
+ <span class='id identifier rubyid_text'>text</span> <span class='op'>=</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='op'>&amp;.</span><span class='id identifier rubyid_gsub'>gsub</span><span class='lparen'>(</span><span class='id identifier rubyid_regex'>regex</span><span class='comma'>,</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'> </span><span class='tstring_end'>&#39;</span></span><span class='rparen'>)</span><span class='op'>&amp;.</span><span class='id identifier rubyid_strip'>strip</span>
1900
+ <span class='id identifier rubyid_text'>text</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>?</span> <span class='kw'>nil</span> <span class='op'>:</span> <span class='id identifier rubyid_decode_html'>decode_html</span><span class='lparen'>(</span><span class='id identifier rubyid_text'>text</span><span class='rparen'>)</span>
1901
+ <span class='kw'>end</span></pre>
1902
+ </td>
1903
+ </tr>
1904
+ </table>
1905
+ </div>
1906
+
1907
+ <div class="method_details ">
1908
+ <h3 class="signature " id="translate_label_to_key-class_method">
1909
+
1910
+ .<strong>translate_label_to_key</strong>(element, label_map) &#x21d2; <tt>Symbol</tt>, <tt>String</tt>
1911
+
1912
+
1913
+
1914
+
1915
+
1916
+ </h3><div class="docstring">
1917
+ <div class="discussion">
1918
+
1919
+ <p>Extract column label and translate it into a frienly key.</p>
1920
+
1921
+
1922
+ </div>
1923
+ </div>
1924
+ <div class="tags">
1925
+ <p class="tag_title">Parameters:</p>
1926
+ <ul class="param">
1927
+
1928
+ <li>
1929
+
1930
+ <span class='name'>element</span>
1931
+
1932
+
1933
+ <span class='type'>(<tt>Nokogiri::Element</tt>)</span>
1934
+
1935
+
1936
+
1937
+ &mdash;
1938
+ <div class='inline'>
1939
+ <p>Html element to parse.</p>
1940
+ </div>
1941
+
1942
+ </li>
1943
+
1944
+ <li>
1945
+
1946
+ <span class='name'>label_map</span>
1947
+
1948
+
1949
+ <span class='type'>(<tt>Hash{Symbol,String =&gt; Regex,String}</tt>)</span>
1950
+
1951
+
1952
+
1953
+ &mdash;
1954
+ <div class='inline'>
1955
+ <p>Label dictionary for translation into key.</p>
1956
+ </div>
1957
+
1958
+ </li>
1959
+
1960
+ </ul>
1961
+
1962
+ <p class="tag_title">Returns:</p>
1963
+ <ul class="return">
1964
+
1965
+ <li>
1966
+
1967
+
1968
+ <span class='type'>(<tt>Symbol</tt>, <tt>String</tt>)</span>
1969
+
1970
+
1971
+
1972
+ &mdash;
1973
+ <div class='inline'>
1974
+ <p>Translated key.</p>
1975
+ </div>
1976
+
1977
+ </li>
1978
+
1979
+ </ul>
1980
+
1981
+ </div><table class="source_code">
1982
+ <tr>
1983
+ <td>
1984
+ <pre class="lines">
1985
+
1986
+
1987
+ 131
1988
+ 132
1989
+ 133
1990
+ 134
1991
+ 135
1992
+ 136
1993
+ 137
1994
+ 138</pre>
1995
+ </td>
1996
+ <td>
1997
+ <pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 131</span>
1998
+
1999
+ <span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_translate_label_to_key'>translate_label_to_key</span> <span class='id identifier rubyid_element'>element</span><span class='comma'>,</span> <span class='id identifier rubyid_label_map'>label_map</span>
2000
+ <span class='id identifier rubyid_element'>element</span><span class='op'>&amp;.</span><span class='id identifier rubyid_search'>search</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>//i</span><span class='tstring_end'>&#39;</span></span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_remove'>remove</span>
2001
+ <span class='id identifier rubyid_text'>text</span> <span class='op'>=</span> <span class='id identifier rubyid_strip'>strip</span> <span class='id identifier rubyid_element'>element</span><span class='op'>&amp;.</span><span class='id identifier rubyid_text'>text</span>
2002
+ <span class='id identifier rubyid_key'>key</span> <span class='op'>=</span> <span class='id identifier rubyid_label_map'>label_map</span><span class='period'>.</span><span class='id identifier rubyid_find'>find</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_k'>k</span><span class='comma'>,</span><span class='id identifier rubyid_v'>v</span><span class='op'>|</span>
2003
+ <span class='id identifier rubyid_v'>v</span><span class='period'>.</span><span class='id identifier rubyid_is_a?'>is_a?</span><span class='lparen'>(</span><span class='const'>Regexp</span><span class='rparen'>)</span> <span class='op'>?</span> <span class='lparen'>(</span><span class='id identifier rubyid_text'>text</span> <span class='op'>=~</span> <span class='id identifier rubyid_v'>v</span><span class='rparen'>)</span> <span class='op'>:</span> <span class='lparen'>(</span><span class='id identifier rubyid_text'>text</span> <span class='op'>==</span> <span class='id identifier rubyid_v'>v</span><span class='rparen'>)</span>
2004
+ <span class='kw'>end</span><span class='op'>&amp;.</span><span class='id identifier rubyid_first'>first</span>
2005
+ <span class='id identifier rubyid_key'>key</span>
2006
+ <span class='kw'>end</span></pre>
2007
+ </td>
2008
+ </tr>
2009
+ </table>
2010
+ </div>
2011
+
2012
+ </div>
2013
+
2014
+ </div>
2015
+
2016
+ <div id="footer">
2017
+ Generated on Tue Feb 26 16:50:03 2019 by
2018
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
2019
+ 0.9.18 (ruby-2.5.3).
2020
+ </div>
2021
+
2022
+ </div>
2023
+ </body>
2024
+ </html>