ae_easy-text 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +12 -0
- data/.travis.yml +7 -0
- data/.yardopts +1 -0
- data/CODE_OF_CONDUCT.md +74 -0
- data/Gemfile +6 -0
- data/LICENSE +21 -0
- data/README.md +16 -0
- data/Rakefile +22 -0
- data/ae_easy-text.gemspec +49 -0
- data/doc/AeEasy.html +117 -0
- data/doc/AeEasy/Text.html +2024 -0
- data/doc/_index.html +122 -0
- data/doc/class_list.html +51 -0
- data/doc/css/common.css +1 -0
- data/doc/css/full_list.css +58 -0
- data/doc/css/style.css +496 -0
- data/doc/file.README.html +91 -0
- data/doc/file_list.html +56 -0
- data/doc/frames.html +17 -0
- data/doc/index.html +91 -0
- data/doc/js/app.js +292 -0
- data/doc/js/full_list.js +216 -0
- data/doc/js/jquery.js +4 -0
- data/doc/method_list.html +131 -0
- data/doc/top-level-namespace.html +110 -0
- data/lib/ae_easy/text.rb +283 -0
- data/lib/ae_easy/text/version.rb +6 -0
- metadata +186 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 239189344e783f67b085da7394e535aa693a4b067c62b8d0b16f733a0b19d4f7
|
4
|
+
data.tar.gz: ca144105f26e399116b05560ff870f6aa051a04696602f6f68f67f06b9e0bfda
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 0b7c4495eeb71e5dae3ad799d14f8a2d83989a949183ee3df2837191b4a4f3a10965ead38416ccda078da7b29fc083eb02fd53a24999f97473b02f77489d921c
|
7
|
+
data.tar.gz: 4f377b26bcfb0ef4cce7806d153fb97de115d0e0bc4beef5e43b55b0125e1936d6c4440d1131cbe9cb4d5fe28a1810c2820269f6c317511068c91aff42ad8126
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/.yardopts
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--no-private
|
data/CODE_OF_CONDUCT.md
ADDED
@@ -0,0 +1,74 @@
|
|
1
|
+
# Contributor Covenant Code of Conduct
|
2
|
+
|
3
|
+
## Our Pledge
|
4
|
+
|
5
|
+
In the interest of fostering an open and welcoming environment, we as
|
6
|
+
contributors and maintainers pledge to making participation in our project and
|
7
|
+
our community a harassment-free experience for everyone, regardless of age, body
|
8
|
+
size, disability, ethnicity, gender identity and expression, level of experience,
|
9
|
+
nationality, personal appearance, race, religion, or sexual identity and
|
10
|
+
orientation.
|
11
|
+
|
12
|
+
## Our Standards
|
13
|
+
|
14
|
+
Examples of behavior that contributes to creating a positive environment
|
15
|
+
include:
|
16
|
+
|
17
|
+
* Using welcoming and inclusive language
|
18
|
+
* Being respectful of differing viewpoints and experiences
|
19
|
+
* Gracefully accepting constructive criticism
|
20
|
+
* Focusing on what is best for the community
|
21
|
+
* Showing empathy towards other community members
|
22
|
+
|
23
|
+
Examples of unacceptable behavior by participants include:
|
24
|
+
|
25
|
+
* The use of sexualized language or imagery and unwelcome sexual attention or
|
26
|
+
advances
|
27
|
+
* Trolling, insulting/derogatory comments, and personal or political attacks
|
28
|
+
* Public or private harassment
|
29
|
+
* Publishing others' private information, such as a physical or electronic
|
30
|
+
address, without explicit permission
|
31
|
+
* Other conduct which could reasonably be considered inappropriate in a
|
32
|
+
professional setting
|
33
|
+
|
34
|
+
## Our Responsibilities
|
35
|
+
|
36
|
+
Project maintainers are responsible for clarifying the standards of acceptable
|
37
|
+
behavior and are expected to take appropriate and fair corrective action in
|
38
|
+
response to any instances of unacceptable behavior.
|
39
|
+
|
40
|
+
Project maintainers have the right and responsibility to remove, edit, or
|
41
|
+
reject comments, commits, code, wiki edits, issues, and other contributions
|
42
|
+
that are not aligned to this Code of Conduct, or to ban temporarily or
|
43
|
+
permanently any contributor for other behaviors that they deem inappropriate,
|
44
|
+
threatening, offensive, or harmful.
|
45
|
+
|
46
|
+
## Scope
|
47
|
+
|
48
|
+
This Code of Conduct applies both within project spaces and in public spaces
|
49
|
+
when an individual is representing the project or its community. Examples of
|
50
|
+
representing a project or community include using an official project e-mail
|
51
|
+
address, posting via an official social media account, or acting as an appointed
|
52
|
+
representative at an online or offline event. Representation of a project may be
|
53
|
+
further defined and clarified by project maintainers.
|
54
|
+
|
55
|
+
## Enforcement
|
56
|
+
|
57
|
+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
58
|
+
reported by contacting the project team at parama@answersengine.com. All
|
59
|
+
complaints will be reviewed and investigated and will result in a response that
|
60
|
+
is deemed necessary and appropriate to the circumstances. The project team is
|
61
|
+
obligated to maintain confidentiality with regard to the reporter of an incident.
|
62
|
+
Further details of specific enforcement policies may be posted separately.
|
63
|
+
|
64
|
+
Project maintainers who do not follow or enforce the Code of Conduct in good
|
65
|
+
faith may face temporary or permanent repercussions as determined by other
|
66
|
+
members of the project's leadership.
|
67
|
+
|
68
|
+
## Attribution
|
69
|
+
|
70
|
+
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
71
|
+
available at [http://contributor-covenant.org/version/1/4][version]
|
72
|
+
|
73
|
+
[homepage]: http://contributor-covenant.org
|
74
|
+
[version]: http://contributor-covenant.org/version/1/4/
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
MIT License
|
2
|
+
|
3
|
+
Copyright (c) 2019 AnswersEngine
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
[](http://rubydoc.org/gems/ae_easy-text/frames)
|
2
|
+
[](http://github.com/answersengine/ae_easy-text/releases)
|
3
|
+
[](#license)
|
4
|
+
|
5
|
+
# AeEasy text module
|
6
|
+
## Description
|
7
|
+
|
8
|
+
AeEasy text is part of AeEasy gem collection. It provides multiple text parsing helpers to ease common text parsing user cases.
|
9
|
+
|
10
|
+
Install gem:
|
11
|
+
```gem install 'ae_easy-text'```
|
12
|
+
|
13
|
+
Require gem:
|
14
|
+
```require 'ae_easy-text'```
|
15
|
+
|
16
|
+
Documentation can be found [here](http://rubydoc.org/gems/ae_easy-text/frames).
|
data/Rakefile
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
require 'benchmark'
|
2
|
+
require 'bundler/gem_tasks'
|
3
|
+
require 'rake/testtask'
|
4
|
+
|
5
|
+
Rake::TestTask.new do |t|
|
6
|
+
t.libs = ['lib', 'test']
|
7
|
+
t.warning = false
|
8
|
+
t.verbose = false
|
9
|
+
t.test_files = FileList['./test/**/*_test.rb']
|
10
|
+
end
|
11
|
+
|
12
|
+
desc 'Benchmark another task execution | usage example: benchmark[my_task, param1, param2]'
|
13
|
+
task :benchmark, [:task] do |task, args|
|
14
|
+
task_name = args[:task]
|
15
|
+
if task_name.nil?
|
16
|
+
puts "Should select a task."
|
17
|
+
exit 1
|
18
|
+
end
|
19
|
+
puts Benchmark.measure{ Rake::Task[task_name].invoke *args.extras }
|
20
|
+
end
|
21
|
+
|
22
|
+
task default: :test
|
@@ -0,0 +1,49 @@
|
|
1
|
+
|
2
|
+
lib = File.expand_path("../lib", __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require "ae_easy/text/version"
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "ae_easy-text"
|
8
|
+
spec.version = AeEasy::Text::VERSION
|
9
|
+
spec.authors = ["Eduardo Rosales"]
|
10
|
+
spec.email = ["eduardo@datahen.com"]
|
11
|
+
|
12
|
+
spec.summary = %q{AnswersEngine Easy toolkit text module}
|
13
|
+
spec.description = %q{AnswersEngine Easy toolkit text module contains multiple text parsing helpers.}
|
14
|
+
spec.homepage = "https://answersengine.com"
|
15
|
+
spec.license = "MIT"
|
16
|
+
|
17
|
+
# spec.cert_chain = ['certs/ae_easy.pem']
|
18
|
+
# spec.signing_key = File.expand_path("~/.ssh/gems/gem-private_ae_easy.pem") if $0 =~ /gem\z/
|
19
|
+
|
20
|
+
# Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
|
21
|
+
# to allow pushing to a single host or delete this section to allow pushing to any host.
|
22
|
+
if spec.respond_to?(:metadata)
|
23
|
+
# spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
|
24
|
+
|
25
|
+
spec.metadata["homepage_uri"] = spec.homepage
|
26
|
+
spec.metadata["source_code_uri"] = "https://github.com/answersengine/ae_easy-text"
|
27
|
+
# spec.metadata["changelog_uri"] = "TODO: Put your gem's CHANGELOG.md URL here."
|
28
|
+
else
|
29
|
+
raise "RubyGems 2.0 or newer is required to protect against " \
|
30
|
+
"public gem pushes."
|
31
|
+
end
|
32
|
+
|
33
|
+
# Specify which files should be added to the gem when it is released.
|
34
|
+
# The `git ls-files -z` loads the files in the RubyGem that have been added into git.
|
35
|
+
spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
|
36
|
+
`git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
37
|
+
end
|
38
|
+
spec.require_paths = ["lib"]
|
39
|
+
spec.required_ruby_version = '>= 2.2.2'
|
40
|
+
|
41
|
+
spec.add_dependency 'ae_easy-core', '>= 0'
|
42
|
+
spec.add_development_dependency 'bundler', '>= 1.16.3'
|
43
|
+
spec.add_development_dependency 'rake', '>= 10.0'
|
44
|
+
spec.add_development_dependency 'minitest', '>= 5.11'
|
45
|
+
spec.add_development_dependency 'simplecov', '>= 0.16.1'
|
46
|
+
spec.add_development_dependency 'simplecov-console', '>= 0.4.2'
|
47
|
+
spec.add_development_dependency 'timecop', '>= 0.9.1'
|
48
|
+
spec.add_development_dependency 'byebug', '>= 0'
|
49
|
+
end
|
data/doc/AeEasy.html
ADDED
@@ -0,0 +1,117 @@
|
|
1
|
+
<!DOCTYPE html>
|
2
|
+
<html>
|
3
|
+
<head>
|
4
|
+
<meta charset="utf-8">
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6
|
+
<title>
|
7
|
+
Module: AeEasy
|
8
|
+
|
9
|
+
— Documentation by YARD 0.9.18
|
10
|
+
|
11
|
+
</title>
|
12
|
+
|
13
|
+
<link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
|
14
|
+
|
15
|
+
<link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
|
16
|
+
|
17
|
+
<script type="text/javascript" charset="utf-8">
|
18
|
+
pathId = "AeEasy";
|
19
|
+
relpath = '';
|
20
|
+
</script>
|
21
|
+
|
22
|
+
|
23
|
+
<script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
|
24
|
+
|
25
|
+
<script type="text/javascript" charset="utf-8" src="js/app.js"></script>
|
26
|
+
|
27
|
+
|
28
|
+
</head>
|
29
|
+
<body>
|
30
|
+
<div class="nav_wrap">
|
31
|
+
<iframe id="nav" src="class_list.html?1"></iframe>
|
32
|
+
<div id="resizer"></div>
|
33
|
+
</div>
|
34
|
+
|
35
|
+
<div id="main" tabindex="-1">
|
36
|
+
<div id="header">
|
37
|
+
<div id="menu">
|
38
|
+
|
39
|
+
<a href="_index.html">Index (A)</a> »
|
40
|
+
|
41
|
+
|
42
|
+
<span class="title">AeEasy</span>
|
43
|
+
|
44
|
+
</div>
|
45
|
+
|
46
|
+
<div id="search">
|
47
|
+
|
48
|
+
<a class="full_list_link" id="class_list_link"
|
49
|
+
href="class_list.html">
|
50
|
+
|
51
|
+
<svg width="24" height="24">
|
52
|
+
<rect x="0" y="4" width="24" height="4" rx="1" ry="1"></rect>
|
53
|
+
<rect x="0" y="12" width="24" height="4" rx="1" ry="1"></rect>
|
54
|
+
<rect x="0" y="20" width="24" height="4" rx="1" ry="1"></rect>
|
55
|
+
</svg>
|
56
|
+
</a>
|
57
|
+
|
58
|
+
</div>
|
59
|
+
<div class="clear"></div>
|
60
|
+
</div>
|
61
|
+
|
62
|
+
<div id="content"><h1>Module: AeEasy
|
63
|
+
|
64
|
+
|
65
|
+
|
66
|
+
</h1>
|
67
|
+
<div class="box_info">
|
68
|
+
|
69
|
+
|
70
|
+
|
71
|
+
|
72
|
+
|
73
|
+
|
74
|
+
|
75
|
+
|
76
|
+
|
77
|
+
|
78
|
+
|
79
|
+
<dl>
|
80
|
+
<dt>Defined in:</dt>
|
81
|
+
<dd>lib/ae_easy/text.rb<span class="defines">,<br />
|
82
|
+
lib/ae_easy/text/version.rb</span>
|
83
|
+
</dd>
|
84
|
+
</dl>
|
85
|
+
|
86
|
+
</div>
|
87
|
+
|
88
|
+
<h2>Defined Under Namespace</h2>
|
89
|
+
<p class="children">
|
90
|
+
|
91
|
+
|
92
|
+
<strong class="modules">Modules:</strong> <span class='object_link'><a href="AeEasy/Text.html" title="AeEasy::Text (module)">Text</a></span>
|
93
|
+
|
94
|
+
|
95
|
+
|
96
|
+
|
97
|
+
</p>
|
98
|
+
|
99
|
+
|
100
|
+
|
101
|
+
|
102
|
+
|
103
|
+
|
104
|
+
|
105
|
+
|
106
|
+
|
107
|
+
</div>
|
108
|
+
|
109
|
+
<div id="footer">
|
110
|
+
Generated on Tue Feb 26 16:50:02 2019 by
|
111
|
+
<a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
|
112
|
+
0.9.18 (ruby-2.5.3).
|
113
|
+
</div>
|
114
|
+
|
115
|
+
</div>
|
116
|
+
</body>
|
117
|
+
</html>
|
@@ -0,0 +1,2024 @@
|
|
1
|
+
<!DOCTYPE html>
|
2
|
+
<html>
|
3
|
+
<head>
|
4
|
+
<meta charset="utf-8">
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6
|
+
<title>
|
7
|
+
Module: AeEasy::Text
|
8
|
+
|
9
|
+
— Documentation by YARD 0.9.18
|
10
|
+
|
11
|
+
</title>
|
12
|
+
|
13
|
+
<link rel="stylesheet" href="../css/style.css" type="text/css" charset="utf-8" />
|
14
|
+
|
15
|
+
<link rel="stylesheet" href="../css/common.css" type="text/css" charset="utf-8" />
|
16
|
+
|
17
|
+
<script type="text/javascript" charset="utf-8">
|
18
|
+
pathId = "AeEasy::Text";
|
19
|
+
relpath = '../';
|
20
|
+
</script>
|
21
|
+
|
22
|
+
|
23
|
+
<script type="text/javascript" charset="utf-8" src="../js/jquery.js"></script>
|
24
|
+
|
25
|
+
<script type="text/javascript" charset="utf-8" src="../js/app.js"></script>
|
26
|
+
|
27
|
+
|
28
|
+
</head>
|
29
|
+
<body>
|
30
|
+
<div class="nav_wrap">
|
31
|
+
<iframe id="nav" src="../class_list.html?1"></iframe>
|
32
|
+
<div id="resizer"></div>
|
33
|
+
</div>
|
34
|
+
|
35
|
+
<div id="main" tabindex="-1">
|
36
|
+
<div id="header">
|
37
|
+
<div id="menu">
|
38
|
+
|
39
|
+
<a href="../_index.html">Index (T)</a> »
|
40
|
+
<span class='title'><span class='object_link'><a href="../AeEasy.html" title="AeEasy (module)">AeEasy</a></span></span>
|
41
|
+
»
|
42
|
+
<span class="title">Text</span>
|
43
|
+
|
44
|
+
</div>
|
45
|
+
|
46
|
+
<div id="search">
|
47
|
+
|
48
|
+
<a class="full_list_link" id="class_list_link"
|
49
|
+
href="../class_list.html">
|
50
|
+
|
51
|
+
<svg width="24" height="24">
|
52
|
+
<rect x="0" y="4" width="24" height="4" rx="1" ry="1"></rect>
|
53
|
+
<rect x="0" y="12" width="24" height="4" rx="1" ry="1"></rect>
|
54
|
+
<rect x="0" y="20" width="24" height="4" rx="1" ry="1"></rect>
|
55
|
+
</svg>
|
56
|
+
</a>
|
57
|
+
|
58
|
+
</div>
|
59
|
+
<div class="clear"></div>
|
60
|
+
</div>
|
61
|
+
|
62
|
+
<div id="content"><h1>Module: AeEasy::Text
|
63
|
+
|
64
|
+
|
65
|
+
|
66
|
+
</h1>
|
67
|
+
<div class="box_info">
|
68
|
+
|
69
|
+
|
70
|
+
|
71
|
+
|
72
|
+
|
73
|
+
|
74
|
+
|
75
|
+
|
76
|
+
|
77
|
+
|
78
|
+
|
79
|
+
<dl>
|
80
|
+
<dt>Defined in:</dt>
|
81
|
+
<dd>lib/ae_easy/text.rb<span class="defines">,<br />
|
82
|
+
lib/ae_easy/text/version.rb</span>
|
83
|
+
</dd>
|
84
|
+
</dl>
|
85
|
+
|
86
|
+
</div>
|
87
|
+
|
88
|
+
|
89
|
+
|
90
|
+
<h2>
|
91
|
+
Constant Summary
|
92
|
+
<small><a href="#" class="constants_summary_toggle">collapse</a></small>
|
93
|
+
</h2>
|
94
|
+
|
95
|
+
<dl class="constants">
|
96
|
+
|
97
|
+
<dt id="VERSION-constant" class="">VERSION =
|
98
|
+
<div class="docstring">
|
99
|
+
<div class="discussion">
|
100
|
+
|
101
|
+
<p>Gem version</p>
|
102
|
+
|
103
|
+
|
104
|
+
</div>
|
105
|
+
</div>
|
106
|
+
<div class="tags">
|
107
|
+
|
108
|
+
|
109
|
+
</div>
|
110
|
+
</dt>
|
111
|
+
<dd><pre class="code"><span class='tstring'><span class='tstring_beg'>"</span><span class='tstring_content'>0.0.1</span><span class='tstring_end'>"</span></span></pre></dd>
|
112
|
+
|
113
|
+
</dl>
|
114
|
+
|
115
|
+
|
116
|
+
|
117
|
+
|
118
|
+
|
119
|
+
|
120
|
+
|
121
|
+
|
122
|
+
|
123
|
+
<h2>
|
124
|
+
Class Method Summary
|
125
|
+
<small><a href="#" class="summary_toggle">collapse</a></small>
|
126
|
+
</h2>
|
127
|
+
|
128
|
+
<ul class="summary">
|
129
|
+
|
130
|
+
<li class="public ">
|
131
|
+
<span class="summary_signature">
|
132
|
+
|
133
|
+
<a href="#decode_html-class_method" title="decode_html (class method)">.<strong>decode_html</strong>(text) ⇒ String </a>
|
134
|
+
|
135
|
+
|
136
|
+
|
137
|
+
</span>
|
138
|
+
|
139
|
+
|
140
|
+
|
141
|
+
|
142
|
+
|
143
|
+
|
144
|
+
|
145
|
+
|
146
|
+
|
147
|
+
<span class="summary_desc"><div class='inline'>
|
148
|
+
<p>Decode HTML entities from text .</p>
|
149
|
+
</div></span>
|
150
|
+
|
151
|
+
</li>
|
152
|
+
|
153
|
+
|
154
|
+
<li class="public ">
|
155
|
+
<span class="summary_signature">
|
156
|
+
|
157
|
+
<a href="#default_parser-class_method" title="default_parser (class method)">.<strong>default_parser</strong>(cell_element, data, key) ⇒ Object </a>
|
158
|
+
|
159
|
+
|
160
|
+
|
161
|
+
</span>
|
162
|
+
|
163
|
+
|
164
|
+
|
165
|
+
|
166
|
+
|
167
|
+
|
168
|
+
|
169
|
+
|
170
|
+
|
171
|
+
<span class="summary_desc"><div class='inline'>
|
172
|
+
<p>Default cell content parser used to parse cell element.</p>
|
173
|
+
</div></span>
|
174
|
+
|
175
|
+
</li>
|
176
|
+
|
177
|
+
|
178
|
+
<li class="public ">
|
179
|
+
<span class="summary_signature">
|
180
|
+
|
181
|
+
<a href="#encode_html-class_method" title="encode_html (class method)">.<strong>encode_html</strong>(text) ⇒ String </a>
|
182
|
+
|
183
|
+
|
184
|
+
|
185
|
+
</span>
|
186
|
+
|
187
|
+
|
188
|
+
|
189
|
+
|
190
|
+
|
191
|
+
|
192
|
+
|
193
|
+
|
194
|
+
|
195
|
+
<span class="summary_desc"><div class='inline'>
|
196
|
+
<p>Encode text for valid HTML entities.</p>
|
197
|
+
</div></span>
|
198
|
+
|
199
|
+
</li>
|
200
|
+
|
201
|
+
|
202
|
+
<li class="public ">
|
203
|
+
<span class="summary_signature">
|
204
|
+
|
205
|
+
<a href="#hash-class_method" title="hash (class method)">.<strong>hash</strong>(object) ⇒ String </a>
|
206
|
+
|
207
|
+
|
208
|
+
|
209
|
+
</span>
|
210
|
+
|
211
|
+
|
212
|
+
|
213
|
+
|
214
|
+
|
215
|
+
|
216
|
+
|
217
|
+
|
218
|
+
|
219
|
+
<span class="summary_desc"><div class='inline'>
|
220
|
+
<p>Create a hash from object.</p>
|
221
|
+
</div></span>
|
222
|
+
|
223
|
+
</li>
|
224
|
+
|
225
|
+
|
226
|
+
<li class="public ">
|
227
|
+
<span class="summary_signature">
|
228
|
+
|
229
|
+
<a href="#parse_content-class_method" title="parse_content (class method)">.<strong>parse_content</strong>(opts) {|data, row, header_map| ... } ⇒ Array<Hash><sup>?</sup> </a>
|
230
|
+
|
231
|
+
|
232
|
+
|
233
|
+
</span>
|
234
|
+
|
235
|
+
|
236
|
+
|
237
|
+
|
238
|
+
|
239
|
+
|
240
|
+
|
241
|
+
|
242
|
+
|
243
|
+
<span class="summary_desc"><div class='inline'>
|
244
|
+
<p>Parse row data matching a selector using a header map to translate
|
245
|
+
between columns and friendly keys.</p>
|
246
|
+
</div></span>
|
247
|
+
|
248
|
+
</li>
|
249
|
+
|
250
|
+
|
251
|
+
<li class="public ">
|
252
|
+
<span class="summary_signature">
|
253
|
+
|
254
|
+
<a href="#parse_header_map-class_method" title="parse_header_map (class method)">.<strong>parse_header_map</strong>(opts = {}) ⇒ Hash{Symbol,String => Integer}<sup>?</sup> </a>
|
255
|
+
|
256
|
+
|
257
|
+
|
258
|
+
</span>
|
259
|
+
|
260
|
+
|
261
|
+
|
262
|
+
|
263
|
+
|
264
|
+
|
265
|
+
|
266
|
+
|
267
|
+
|
268
|
+
<span class="summary_desc"><div class='inline'>
|
269
|
+
<p>Parse header from selector and create a header map to match a column key
|
270
|
+
with column index.</p>
|
271
|
+
</div></span>
|
272
|
+
|
273
|
+
</li>
|
274
|
+
|
275
|
+
|
276
|
+
<li class="public ">
|
277
|
+
<span class="summary_signature">
|
278
|
+
|
279
|
+
<a href="#parse_table-class_method" title="parse_table (class method)">.<strong>parse_table</strong>(opts = {}) {|data, row, header_map| ... } ⇒ Hash{Symbol => Array,Hash,nil} </a>
|
280
|
+
|
281
|
+
|
282
|
+
|
283
|
+
</span>
|
284
|
+
|
285
|
+
|
286
|
+
|
287
|
+
|
288
|
+
|
289
|
+
|
290
|
+
|
291
|
+
|
292
|
+
|
293
|
+
<span class="summary_desc"><div class='inline'>
|
294
|
+
<p>Parse data from a horizontal table like structure matching a selectors and
|
295
|
+
using a header map to match columns.</p>
|
296
|
+
</div></span>
|
297
|
+
|
298
|
+
</li>
|
299
|
+
|
300
|
+
|
301
|
+
<li class="public ">
|
302
|
+
<span class="summary_signature">
|
303
|
+
|
304
|
+
<a href="#parse_vertical_table-class_method" title="parse_vertical_table (class method)">.<strong>parse_vertical_table</strong>(opts = {}) {|data, row, header_map| ... } ⇒ Hash{Symbol => Array,Hash,nil} </a>
|
305
|
+
|
306
|
+
|
307
|
+
|
308
|
+
</span>
|
309
|
+
|
310
|
+
|
311
|
+
|
312
|
+
|
313
|
+
|
314
|
+
|
315
|
+
|
316
|
+
|
317
|
+
|
318
|
+
<span class="summary_desc"><div class='inline'>
|
319
|
+
<p>Parse data from a vertical table like structure matching a selectors and
|
320
|
+
using a header map to match columns.</p>
|
321
|
+
</div></span>
|
322
|
+
|
323
|
+
</li>
|
324
|
+
|
325
|
+
|
326
|
+
<li class="public ">
|
327
|
+
<span class="summary_signature">
|
328
|
+
|
329
|
+
<a href="#strip-class_method" title="strip (class method)">.<strong>strip</strong>(raw_text) ⇒ String<sup>?</sup> </a>
|
330
|
+
|
331
|
+
|
332
|
+
|
333
|
+
</span>
|
334
|
+
|
335
|
+
|
336
|
+
|
337
|
+
|
338
|
+
|
339
|
+
|
340
|
+
|
341
|
+
|
342
|
+
|
343
|
+
<span class="summary_desc"><div class='inline'>
|
344
|
+
<p>Strip a value.</p>
|
345
|
+
</div></span>
|
346
|
+
|
347
|
+
</li>
|
348
|
+
|
349
|
+
|
350
|
+
<li class="public ">
|
351
|
+
<span class="summary_signature">
|
352
|
+
|
353
|
+
<a href="#translate_label_to_key-class_method" title="translate_label_to_key (class method)">.<strong>translate_label_to_key</strong>(element, label_map) ⇒ Symbol, String </a>
|
354
|
+
|
355
|
+
|
356
|
+
|
357
|
+
</span>
|
358
|
+
|
359
|
+
|
360
|
+
|
361
|
+
|
362
|
+
|
363
|
+
|
364
|
+
|
365
|
+
|
366
|
+
|
367
|
+
<span class="summary_desc"><div class='inline'>
|
368
|
+
<p>Extract column label and translate it into a frienly key.</p>
|
369
|
+
</div></span>
|
370
|
+
|
371
|
+
</li>
|
372
|
+
|
373
|
+
|
374
|
+
</ul>
|
375
|
+
|
376
|
+
|
377
|
+
|
378
|
+
|
379
|
+
<div id="class_method_details" class="method_details_list">
|
380
|
+
<h2>Class Method Details</h2>
|
381
|
+
|
382
|
+
|
383
|
+
<div class="method_details first">
|
384
|
+
<h3 class="signature first" id="decode_html-class_method">
|
385
|
+
|
386
|
+
.<strong>decode_html</strong>(text) ⇒ <tt>String</tt>
|
387
|
+
|
388
|
+
|
389
|
+
|
390
|
+
|
391
|
+
|
392
|
+
</h3><div class="docstring">
|
393
|
+
<div class="discussion">
|
394
|
+
|
395
|
+
<p>Decode HTML entities from text .</p>
|
396
|
+
|
397
|
+
|
398
|
+
</div>
|
399
|
+
</div>
|
400
|
+
<div class="tags">
|
401
|
+
<p class="tag_title">Parameters:</p>
|
402
|
+
<ul class="param">
|
403
|
+
|
404
|
+
<li>
|
405
|
+
|
406
|
+
<span class='name'>text</span>
|
407
|
+
|
408
|
+
|
409
|
+
<span class='type'>(<tt>String</tt>)</span>
|
410
|
+
|
411
|
+
|
412
|
+
|
413
|
+
—
|
414
|
+
<div class='inline'>
|
415
|
+
<p>Text to decode.</p>
|
416
|
+
</div>
|
417
|
+
|
418
|
+
</li>
|
419
|
+
|
420
|
+
</ul>
|
421
|
+
|
422
|
+
<p class="tag_title">Returns:</p>
|
423
|
+
<ul class="return">
|
424
|
+
|
425
|
+
<li>
|
426
|
+
|
427
|
+
|
428
|
+
<span class='type'>(<tt>String</tt>)</span>
|
429
|
+
|
430
|
+
|
431
|
+
|
432
|
+
</li>
|
433
|
+
|
434
|
+
</ul>
|
435
|
+
|
436
|
+
</div><table class="source_code">
|
437
|
+
<tr>
|
438
|
+
<td>
|
439
|
+
<pre class="lines">
|
440
|
+
|
441
|
+
|
442
|
+
33
|
443
|
+
34
|
444
|
+
35</pre>
|
445
|
+
</td>
|
446
|
+
<td>
|
447
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 33</span>
|
448
|
+
|
449
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_decode_html'>decode_html</span> <span class='id identifier rubyid_text'>text</span>
|
450
|
+
<span class='const'>CGI</span><span class='period'>.</span><span class='id identifier rubyid_unescapeHTML'>unescapeHTML</span> <span class='id identifier rubyid_text'>text</span>
|
451
|
+
<span class='kw'>end</span></pre>
|
452
|
+
</td>
|
453
|
+
</tr>
|
454
|
+
</table>
|
455
|
+
</div>
|
456
|
+
|
457
|
+
<div class="method_details ">
|
458
|
+
<h3 class="signature " id="default_parser-class_method">
|
459
|
+
|
460
|
+
.<strong>default_parser</strong>(cell_element, data, key) ⇒ <tt>Object</tt>
|
461
|
+
|
462
|
+
|
463
|
+
|
464
|
+
|
465
|
+
|
466
|
+
</h3><div class="docstring">
|
467
|
+
<div class="discussion">
|
468
|
+
|
469
|
+
<p>Default cell content parser used to parse cell element.</p>
|
470
|
+
|
471
|
+
|
472
|
+
</div>
|
473
|
+
</div>
|
474
|
+
<div class="tags">
|
475
|
+
<p class="tag_title">Parameters:</p>
|
476
|
+
<ul class="param">
|
477
|
+
|
478
|
+
<li>
|
479
|
+
|
480
|
+
<span class='name'>cell_element</span>
|
481
|
+
|
482
|
+
|
483
|
+
<span class='type'>(<tt>Nokogiri::Element</tt>)</span>
|
484
|
+
|
485
|
+
|
486
|
+
|
487
|
+
—
|
488
|
+
<div class='inline'>
|
489
|
+
<p>Cell element to parse.</p>
|
490
|
+
</div>
|
491
|
+
|
492
|
+
</li>
|
493
|
+
|
494
|
+
<li>
|
495
|
+
|
496
|
+
<span class='name'>data</span>
|
497
|
+
|
498
|
+
|
499
|
+
<span class='type'>(<tt>Hash</tt>)</span>
|
500
|
+
|
501
|
+
|
502
|
+
|
503
|
+
—
|
504
|
+
<div class='inline'>
|
505
|
+
<p>Data hash to save parsed data into.</p>
|
506
|
+
</div>
|
507
|
+
|
508
|
+
</li>
|
509
|
+
|
510
|
+
<li>
|
511
|
+
|
512
|
+
<span class='name'>key</span>
|
513
|
+
|
514
|
+
|
515
|
+
<span class='type'>(<tt>String</tt>, <tt>Symbol</tt>)</span>
|
516
|
+
|
517
|
+
|
518
|
+
|
519
|
+
—
|
520
|
+
<div class='inline'>
|
521
|
+
<p>Header column key being parsed.</p>
|
522
|
+
</div>
|
523
|
+
|
524
|
+
</li>
|
525
|
+
|
526
|
+
</ul>
|
527
|
+
|
528
|
+
|
529
|
+
</div><table class="source_code">
|
530
|
+
<tr>
|
531
|
+
<td>
|
532
|
+
<pre class="lines">
|
533
|
+
|
534
|
+
|
535
|
+
60
|
536
|
+
61
|
537
|
+
62
|
538
|
+
63</pre>
|
539
|
+
</td>
|
540
|
+
<td>
|
541
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 60</span>
|
542
|
+
|
543
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_default_parser'>default_parser</span> <span class='id identifier rubyid_cell_element'>cell_element</span><span class='comma'>,</span> <span class='id identifier rubyid_data'>data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span>
|
544
|
+
<span class='id identifier rubyid_cell_element'>cell_element</span><span class='op'>&.</span><span class='id identifier rubyid_search'>search</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>//i</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_remove'>remove</span>
|
545
|
+
<span class='id identifier rubyid_row_data'>row_data</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span> <span class='op'>=</span> <span class='id identifier rubyid_strip'>strip</span> <span class='id identifier rubyid_cell_element'>cell_element</span><span class='op'>&.</span><span class='id identifier rubyid_text'>text</span>
|
546
|
+
<span class='kw'>end</span></pre>
|
547
|
+
</td>
|
548
|
+
</tr>
|
549
|
+
</table>
|
550
|
+
</div>
|
551
|
+
|
552
|
+
<div class="method_details ">
|
553
|
+
<h3 class="signature " id="encode_html-class_method">
|
554
|
+
|
555
|
+
.<strong>encode_html</strong>(text) ⇒ <tt>String</tt>
|
556
|
+
|
557
|
+
|
558
|
+
|
559
|
+
|
560
|
+
|
561
|
+
</h3><div class="docstring">
|
562
|
+
<div class="discussion">
|
563
|
+
|
564
|
+
<p>Encode text for valid HTML entities.</p>
|
565
|
+
|
566
|
+
|
567
|
+
</div>
|
568
|
+
</div>
|
569
|
+
<div class="tags">
|
570
|
+
<p class="tag_title">Parameters:</p>
|
571
|
+
<ul class="param">
|
572
|
+
|
573
|
+
<li>
|
574
|
+
|
575
|
+
<span class='name'>text</span>
|
576
|
+
|
577
|
+
|
578
|
+
<span class='type'>(<tt>String</tt>)</span>
|
579
|
+
|
580
|
+
|
581
|
+
|
582
|
+
—
|
583
|
+
<div class='inline'>
|
584
|
+
<p>Text to encode.</p>
|
585
|
+
</div>
|
586
|
+
|
587
|
+
</li>
|
588
|
+
|
589
|
+
</ul>
|
590
|
+
|
591
|
+
<p class="tag_title">Returns:</p>
|
592
|
+
<ul class="return">
|
593
|
+
|
594
|
+
<li>
|
595
|
+
|
596
|
+
|
597
|
+
<span class='type'>(<tt>String</tt>)</span>
|
598
|
+
|
599
|
+
|
600
|
+
|
601
|
+
</li>
|
602
|
+
|
603
|
+
</ul>
|
604
|
+
|
605
|
+
</div><table class="source_code">
|
606
|
+
<tr>
|
607
|
+
<td>
|
608
|
+
<pre class="lines">
|
609
|
+
|
610
|
+
|
611
|
+
24
|
612
|
+
25
|
613
|
+
26</pre>
|
614
|
+
</td>
|
615
|
+
<td>
|
616
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 24</span>
|
617
|
+
|
618
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_encode_html'>encode_html</span> <span class='id identifier rubyid_text'>text</span>
|
619
|
+
<span class='const'>CGI</span><span class='period'>.</span><span class='id identifier rubyid_escapeHTML'>escapeHTML</span> <span class='id identifier rubyid_text'>text</span>
|
620
|
+
<span class='kw'>end</span></pre>
|
621
|
+
</td>
|
622
|
+
</tr>
|
623
|
+
</table>
|
624
|
+
</div>
|
625
|
+
|
626
|
+
<div class="method_details ">
|
627
|
+
<h3 class="signature " id="hash-class_method">
|
628
|
+
|
629
|
+
.<strong>hash</strong>(object) ⇒ <tt>String</tt>
|
630
|
+
|
631
|
+
|
632
|
+
|
633
|
+
|
634
|
+
|
635
|
+
</h3><div class="docstring">
|
636
|
+
<div class="discussion">
|
637
|
+
|
638
|
+
<p>Create a hash from object</p>
|
639
|
+
|
640
|
+
|
641
|
+
</div>
|
642
|
+
</div>
|
643
|
+
<div class="tags">
|
644
|
+
<p class="tag_title">Parameters:</p>
|
645
|
+
<ul class="param">
|
646
|
+
|
647
|
+
<li>
|
648
|
+
|
649
|
+
<span class='name'>object</span>
|
650
|
+
|
651
|
+
|
652
|
+
<span class='type'>(<tt>String</tt>, <tt>Hash</tt>, <tt>Object</tt>)</span>
|
653
|
+
|
654
|
+
|
655
|
+
|
656
|
+
—
|
657
|
+
<div class='inline'>
|
658
|
+
<p>Object to create hash from.</p>
|
659
|
+
</div>
|
660
|
+
|
661
|
+
</li>
|
662
|
+
|
663
|
+
</ul>
|
664
|
+
|
665
|
+
<p class="tag_title">Returns:</p>
|
666
|
+
<ul class="return">
|
667
|
+
|
668
|
+
<li>
|
669
|
+
|
670
|
+
|
671
|
+
<span class='type'>(<tt>String</tt>)</span>
|
672
|
+
|
673
|
+
|
674
|
+
|
675
|
+
</li>
|
676
|
+
|
677
|
+
</ul>
|
678
|
+
|
679
|
+
</div><table class="source_code">
|
680
|
+
<tr>
|
681
|
+
<td>
|
682
|
+
<pre class="lines">
|
683
|
+
|
684
|
+
|
685
|
+
14
|
686
|
+
15
|
687
|
+
16
|
688
|
+
17</pre>
|
689
|
+
</td>
|
690
|
+
<td>
|
691
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 14</span>
|
692
|
+
|
693
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_hash'>hash</span> <span class='id identifier rubyid_object'>object</span>
|
694
|
+
<span class='id identifier rubyid_object'>object</span> <span class='op'>=</span> <span class='id identifier rubyid_object'>object</span><span class='period'>.</span><span class='id identifier rubyid_hash'>hash</span> <span class='kw'>if</span> <span class='id identifier rubyid_object'>object</span><span class='period'>.</span><span class='id identifier rubyid_is_a?'>is_a?</span> <span class='const'>Hash</span>
|
695
|
+
<span class='const'>Digest</span><span class='op'>::</span><span class='const'>SHA1</span><span class='period'>.</span><span class='id identifier rubyid_hexdigest'>hexdigest</span> <span class='id identifier rubyid_object'>object</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
|
696
|
+
<span class='kw'>end</span></pre>
|
697
|
+
</td>
|
698
|
+
</tr>
|
699
|
+
</table>
|
700
|
+
</div>
|
701
|
+
|
702
|
+
<div class="method_details ">
|
703
|
+
<h3 class="signature " id="parse_content-class_method">
|
704
|
+
|
705
|
+
.<strong>parse_content</strong>(opts) {|data, row, header_map| ... } ⇒ <tt>Array<Hash></tt><sup>?</sup>
|
706
|
+
|
707
|
+
|
708
|
+
|
709
|
+
|
710
|
+
|
711
|
+
</h3><div class="docstring">
|
712
|
+
<div class="discussion">
|
713
|
+
|
714
|
+
<p>Parse row data matching a selector using a header map to translate</p>
|
715
|
+
|
716
|
+
<pre class="code ruby"><code class="ruby">between columns and friendly keys.
|
717
|
+
</code></pre>
|
718
|
+
|
719
|
+
|
720
|
+
</div>
|
721
|
+
</div>
|
722
|
+
<div class="tags">
|
723
|
+
<p class="tag_title">Parameters:</p>
|
724
|
+
<ul class="param">
|
725
|
+
|
726
|
+
<li>
|
727
|
+
|
728
|
+
<span class='name'>opts</span>
|
729
|
+
|
730
|
+
|
731
|
+
<span class='type'>(<tt>Hash</tt>)</span>
|
732
|
+
|
733
|
+
|
734
|
+
|
735
|
+
—
|
736
|
+
<div class='inline'>
|
737
|
+
<p>({}) Configuration options.</p>
|
738
|
+
</div>
|
739
|
+
|
740
|
+
</li>
|
741
|
+
|
742
|
+
</ul>
|
743
|
+
|
744
|
+
|
745
|
+
|
746
|
+
|
747
|
+
<p class="tag_title">Options Hash (<tt>opts</tt>):</p>
|
748
|
+
<ul class="option">
|
749
|
+
|
750
|
+
<li>
|
751
|
+
<span class="name">:html</span>
|
752
|
+
<span class="type">(<tt>Nokogiri::Element</tt>)</span>
|
753
|
+
<span class="default">
|
754
|
+
|
755
|
+
</span>
|
756
|
+
|
757
|
+
— <div class='inline'>
|
758
|
+
<p>Container element to search into.</p>
|
759
|
+
</div>
|
760
|
+
|
761
|
+
</li>
|
762
|
+
|
763
|
+
<li>
|
764
|
+
<span class="name">:selector</span>
|
765
|
+
<span class="type">(<tt>String</tt>)</span>
|
766
|
+
<span class="default">
|
767
|
+
|
768
|
+
</span>
|
769
|
+
|
770
|
+
— <div class='inline'>
|
771
|
+
<p>CSS selector to match content cells.</p>
|
772
|
+
</div>
|
773
|
+
|
774
|
+
</li>
|
775
|
+
|
776
|
+
<li>
|
777
|
+
<span class="name">:first_row_header</span>
|
778
|
+
<span class="type">(<tt>Boolean</tt>)</span>
|
779
|
+
<span class="default">
|
780
|
+
|
781
|
+
— default:
|
782
|
+
<tt>false</tt>
|
783
|
+
|
784
|
+
</span>
|
785
|
+
|
786
|
+
— <div class='inline'>
|
787
|
+
<p>If true then first matching element will be assumed to be header and
|
788
|
+
ignored.</p>
|
789
|
+
</div>
|
790
|
+
|
791
|
+
</li>
|
792
|
+
|
793
|
+
<li>
|
794
|
+
<span class="name">:header_map</span>
|
795
|
+
<span class="type">(<tt>Hash{Symbol,String => Integer}</tt>)</span>
|
796
|
+
<span class="default">
|
797
|
+
|
798
|
+
</span>
|
799
|
+
|
800
|
+
— <div class='inline'>
|
801
|
+
<p>Header key vs index dictionary.</p>
|
802
|
+
</div>
|
803
|
+
|
804
|
+
</li>
|
805
|
+
|
806
|
+
<li>
|
807
|
+
<span class="name">:column_parsers</span>
|
808
|
+
<span class="type">(<tt>Hash{Symbol,String => lambda,proc}</tt>)</span>
|
809
|
+
<span class="default">
|
810
|
+
|
811
|
+
— default:
|
812
|
+
<tt>{}</tt>
|
813
|
+
|
814
|
+
</span>
|
815
|
+
|
816
|
+
— <div class='inline'>
|
817
|
+
<p>Custom column parsers for advance data extraction.</p>
|
818
|
+
</div>
|
819
|
+
|
820
|
+
</li>
|
821
|
+
|
822
|
+
</ul>
|
823
|
+
|
824
|
+
|
825
|
+
<p class="tag_title">Yield Parameters:</p>
|
826
|
+
<ul class="yieldparam">
|
827
|
+
|
828
|
+
<li>
|
829
|
+
|
830
|
+
<span class='name'>data</span>
|
831
|
+
|
832
|
+
|
833
|
+
<span class='type'>(<tt>Hash{Symbol,String => Object}</tt>)</span>
|
834
|
+
|
835
|
+
|
836
|
+
|
837
|
+
—
|
838
|
+
<div class='inline'>
|
839
|
+
<p>Parsed row data.</p>
|
840
|
+
</div>
|
841
|
+
|
842
|
+
</li>
|
843
|
+
|
844
|
+
<li>
|
845
|
+
|
846
|
+
<span class='name'>row</span>
|
847
|
+
|
848
|
+
|
849
|
+
<span class='type'>(<tt>Array</tt>)</span>
|
850
|
+
|
851
|
+
|
852
|
+
|
853
|
+
—
|
854
|
+
<div class='inline'>
|
855
|
+
<p>Raw row data.</p>
|
856
|
+
</div>
|
857
|
+
|
858
|
+
</li>
|
859
|
+
|
860
|
+
<li>
|
861
|
+
|
862
|
+
<span class='name'>header_map</span>
|
863
|
+
|
864
|
+
|
865
|
+
<span class='type'>(<tt>Hash{Symbol,String => Integer}</tt>)</span>
|
866
|
+
|
867
|
+
|
868
|
+
|
869
|
+
—
|
870
|
+
<div class='inline'>
|
871
|
+
<p>Header map used.</p>
|
872
|
+
</div>
|
873
|
+
|
874
|
+
</li>
|
875
|
+
|
876
|
+
</ul>
|
877
|
+
<p class="tag_title">Yield Returns:</p>
|
878
|
+
<ul class="yieldreturn">
|
879
|
+
|
880
|
+
<li>
|
881
|
+
|
882
|
+
|
883
|
+
<span class='type'>(<tt>Boolean</tt>)</span>
|
884
|
+
|
885
|
+
|
886
|
+
|
887
|
+
—
|
888
|
+
<div class='inline'>
|
889
|
+
<p>`true` when valid, else `false`.</p>
|
890
|
+
</div>
|
891
|
+
|
892
|
+
</li>
|
893
|
+
|
894
|
+
</ul>
|
895
|
+
<p class="tag_title">Returns:</p>
|
896
|
+
<ul class="return">
|
897
|
+
|
898
|
+
<li>
|
899
|
+
|
900
|
+
|
901
|
+
<span class='type'>(<tt>Array<Hash></tt>, <tt>nil</tt>)</span>
|
902
|
+
|
903
|
+
|
904
|
+
|
905
|
+
—
|
906
|
+
<div class='inline'>
|
907
|
+
<p>Parsed rows data.</p>
|
908
|
+
</div>
|
909
|
+
|
910
|
+
</li>
|
911
|
+
|
912
|
+
</ul>
|
913
|
+
|
914
|
+
</div><table class="source_code">
|
915
|
+
<tr>
|
916
|
+
<td>
|
917
|
+
<pre class="lines">
|
918
|
+
|
919
|
+
|
920
|
+
84
|
921
|
+
85
|
922
|
+
86
|
923
|
+
87
|
924
|
+
88
|
925
|
+
89
|
926
|
+
90
|
927
|
+
91
|
928
|
+
92
|
929
|
+
93
|
930
|
+
94
|
931
|
+
95
|
932
|
+
96
|
933
|
+
97
|
934
|
+
98
|
935
|
+
99
|
936
|
+
100
|
937
|
+
101
|
938
|
+
102
|
939
|
+
103
|
940
|
+
104
|
941
|
+
105
|
942
|
+
106
|
943
|
+
107
|
944
|
+
108
|
945
|
+
109
|
946
|
+
110
|
947
|
+
111
|
948
|
+
112
|
949
|
+
113
|
950
|
+
114
|
951
|
+
115
|
952
|
+
116
|
953
|
+
117
|
954
|
+
118
|
955
|
+
119
|
956
|
+
120
|
957
|
+
121
|
958
|
+
122</pre>
|
959
|
+
</td>
|
960
|
+
<td>
|
961
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 84</span>
|
962
|
+
|
963
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_content'>parse_content</span> <span class='id identifier rubyid_opts'>opts</span><span class='comma'>,</span> <span class='op'>&</span><span class='id identifier rubyid_filter'>filter</span>
|
964
|
+
<span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
|
965
|
+
<span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
966
|
+
<span class='label'>selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
967
|
+
<span class='label'>first_row_header:</span> <span class='kw'>false</span><span class='comma'>,</span>
|
968
|
+
<span class='label'>header_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
|
969
|
+
<span class='label'>column_parsers:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
|
970
|
+
<span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
|
971
|
+
|
972
|
+
<span class='comment'># Setup config
|
973
|
+
</span> <span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='lbracket'>[</span><span class='rbracket'>]</span>
|
974
|
+
<span class='id identifier rubyid_row_data'>row_data</span> <span class='op'>=</span> <span class='id identifier rubyid_child_element'>child_element</span> <span class='op'>=</span> <span class='kw'>nil</span>
|
975
|
+
<span class='id identifier rubyid_first'>first</span> <span class='op'>=</span> <span class='id identifier rubyid_first_row_header'>first_row_header</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span>
|
976
|
+
<span class='id identifier rubyid_header_map'>header_map</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_map</span><span class='rbracket'>]</span>
|
977
|
+
<span class='id identifier rubyid_column_parsers'>column_parsers</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_parsers</span><span class='rbracket'>]</span>
|
978
|
+
|
979
|
+
<span class='comment'># Get and parse rows
|
980
|
+
</span> <span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:selector</span><span class='rbracket'>]</span><span class='rparen'>)</span>
|
981
|
+
<span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_row'>row</span><span class='op'>|</span>
|
982
|
+
<span class='comment'># First row header validation
|
983
|
+
</span> <span class='kw'>if</span> <span class='id identifier rubyid_first'>first</span> <span class='op'>&&</span> <span class='id identifier rubyid_first_row_header'>first_row_header</span>
|
984
|
+
<span class='id identifier rubyid_first'>first</span> <span class='op'>=</span> <span class='kw'>false</span>
|
985
|
+
<span class='kw'>next</span>
|
986
|
+
<span class='kw'>end</span>
|
987
|
+
|
988
|
+
<span class='comment'># Extract content data
|
989
|
+
</span> <span class='id identifier rubyid_row_data'>row_data</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
|
990
|
+
<span class='id identifier rubyid_header_map'>header_map</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_key'>key</span><span class='comma'>,</span> <span class='id identifier rubyid_index'>index</span><span class='op'>|</span>
|
991
|
+
<span class='comment'># Parse column html with default or custom parser
|
992
|
+
</span> <span class='id identifier rubyid_child_element'>child_element</span> <span class='op'>=</span> <span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_children'>children</span><span class='lbracket'>[</span><span class='id identifier rubyid_index'>index</span><span class='rbracket'>]</span>
|
993
|
+
<span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>?</span>
|
994
|
+
<span class='id identifier rubyid_default_parser'>default_parser</span><span class='lparen'>(</span><span class='id identifier rubyid_child_element'>child_element</span><span class='comma'>,</span> <span class='id identifier rubyid_row_data'>row_data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span> <span class='op'>:</span>
|
995
|
+
<span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_call'>call</span><span class='lparen'>(</span><span class='id identifier rubyid_child_element'>child_element</span><span class='comma'>,</span> <span class='id identifier rubyid_row_data'>row_data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span>
|
996
|
+
<span class='kw'>end</span>
|
997
|
+
<span class='kw'>next</span> <span class='kw'>unless</span> <span class='id identifier rubyid_filter'>filter</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>||</span> <span class='id identifier rubyid_filter'>filter</span><span class='period'>.</span><span class='id identifier rubyid_call'>call</span><span class='lparen'>(</span><span class='id identifier rubyid_row_data'>row_data</span><span class='comma'>,</span> <span class='id identifier rubyid_row'>row</span><span class='comma'>,</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='rparen'>)</span>
|
998
|
+
<span class='id identifier rubyid_data'>data</span> <span class='op'><<</span> <span class='id identifier rubyid_row_data'>row_data</span>
|
999
|
+
<span class='kw'>end</span>
|
1000
|
+
<span class='id identifier rubyid_data'>data</span>
|
1001
|
+
<span class='kw'>end</span></pre>
|
1002
|
+
</td>
|
1003
|
+
</tr>
|
1004
|
+
</table>
|
1005
|
+
</div>
|
1006
|
+
|
1007
|
+
<div class="method_details ">
|
1008
|
+
<h3 class="signature " id="parse_header_map-class_method">
|
1009
|
+
|
1010
|
+
.<strong>parse_header_map</strong>(opts = {}) ⇒ <tt>Hash{Symbol,String => Integer}</tt><sup>?</sup>
|
1011
|
+
|
1012
|
+
|
1013
|
+
|
1014
|
+
|
1015
|
+
|
1016
|
+
</h3><div class="docstring">
|
1017
|
+
<div class="discussion">
|
1018
|
+
|
1019
|
+
<p>Parse header from selector and create a header map to match a column key</p>
|
1020
|
+
|
1021
|
+
<pre class="code ruby"><code class="ruby">with column index.
|
1022
|
+
</code></pre>
|
1023
|
+
|
1024
|
+
|
1025
|
+
</div>
|
1026
|
+
</div>
|
1027
|
+
<div class="tags">
|
1028
|
+
<p class="tag_title">Parameters:</p>
|
1029
|
+
<ul class="param">
|
1030
|
+
|
1031
|
+
<li>
|
1032
|
+
|
1033
|
+
<span class='name'>opts</span>
|
1034
|
+
|
1035
|
+
|
1036
|
+
<span class='type'>(<tt>Hash</tt>)</span>
|
1037
|
+
|
1038
|
+
|
1039
|
+
<em class="default">(defaults to: <tt>{}</tt>)</em>
|
1040
|
+
|
1041
|
+
|
1042
|
+
—
|
1043
|
+
<div class='inline'>
|
1044
|
+
<p>({}) Configuration options.</p>
|
1045
|
+
</div>
|
1046
|
+
|
1047
|
+
</li>
|
1048
|
+
|
1049
|
+
</ul>
|
1050
|
+
|
1051
|
+
|
1052
|
+
|
1053
|
+
|
1054
|
+
<p class="tag_title">Options Hash (<tt>opts</tt>):</p>
|
1055
|
+
<ul class="option">
|
1056
|
+
|
1057
|
+
<li>
|
1058
|
+
<span class="name">:html</span>
|
1059
|
+
<span class="type">(<tt>Nokogiri::Element</tt>)</span>
|
1060
|
+
<span class="default">
|
1061
|
+
|
1062
|
+
</span>
|
1063
|
+
|
1064
|
+
— <div class='inline'>
|
1065
|
+
<p>Container element to search into.</p>
|
1066
|
+
</div>
|
1067
|
+
|
1068
|
+
</li>
|
1069
|
+
|
1070
|
+
<li>
|
1071
|
+
<span class="name">:selector</span>
|
1072
|
+
<span class="type">(<tt>String</tt>)</span>
|
1073
|
+
<span class="default">
|
1074
|
+
|
1075
|
+
</span>
|
1076
|
+
|
1077
|
+
— <div class='inline'>
|
1078
|
+
<p>CSS selector to match header cells.</p>
|
1079
|
+
</div>
|
1080
|
+
|
1081
|
+
</li>
|
1082
|
+
|
1083
|
+
<li>
|
1084
|
+
<span class="name">:column_key_label_map</span>
|
1085
|
+
<span class="type">(<tt>Hash{Symbol,String => Regex,String}</tt>)</span>
|
1086
|
+
<span class="default">
|
1087
|
+
|
1088
|
+
</span>
|
1089
|
+
|
1090
|
+
— <div class='inline'>
|
1091
|
+
<p>Key vs. label dictionary.</p>
|
1092
|
+
</div>
|
1093
|
+
|
1094
|
+
</li>
|
1095
|
+
|
1096
|
+
<li>
|
1097
|
+
<span class="name">:first_row_header</span>
|
1098
|
+
<span class="type">(<tt>Boolean</tt>)</span>
|
1099
|
+
<span class="default">
|
1100
|
+
|
1101
|
+
— default:
|
1102
|
+
<tt>false</tt>
|
1103
|
+
|
1104
|
+
</span>
|
1105
|
+
|
1106
|
+
— <div class='inline'>
|
1107
|
+
<p>If true then selector first matching row will be used as header for
|
1108
|
+
parsing.</p>
|
1109
|
+
</div>
|
1110
|
+
|
1111
|
+
</li>
|
1112
|
+
|
1113
|
+
</ul>
|
1114
|
+
|
1115
|
+
|
1116
|
+
<p class="tag_title">Returns:</p>
|
1117
|
+
<ul class="return">
|
1118
|
+
|
1119
|
+
<li>
|
1120
|
+
|
1121
|
+
|
1122
|
+
<span class='type'>(<tt>Hash{Symbol,String => Integer}</tt>, <tt>nil</tt>)</span>
|
1123
|
+
|
1124
|
+
|
1125
|
+
|
1126
|
+
—
|
1127
|
+
<div class='inline'>
|
1128
|
+
<p>Key vs. column index map.</p>
|
1129
|
+
</div>
|
1130
|
+
|
1131
|
+
</li>
|
1132
|
+
|
1133
|
+
</ul>
|
1134
|
+
|
1135
|
+
</div><table class="source_code">
|
1136
|
+
<tr>
|
1137
|
+
<td>
|
1138
|
+
<pre class="lines">
|
1139
|
+
|
1140
|
+
|
1141
|
+
152
|
1142
|
+
153
|
1143
|
+
154
|
1144
|
+
155
|
1145
|
+
156
|
1146
|
+
157
|
1147
|
+
158
|
1148
|
+
159
|
1149
|
+
160
|
1150
|
+
161
|
1151
|
+
162
|
1152
|
+
163
|
1153
|
+
164
|
1154
|
+
165
|
1155
|
+
166
|
1156
|
+
167
|
1157
|
+
168
|
1158
|
+
169
|
1159
|
+
170
|
1160
|
+
171
|
1161
|
+
172
|
1162
|
+
173
|
1163
|
+
174
|
1164
|
+
175
|
1165
|
+
176
|
1166
|
+
177
|
1167
|
+
178
|
1168
|
+
179
|
1169
|
+
180</pre>
|
1170
|
+
</td>
|
1171
|
+
<td>
|
1172
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 152</span>
|
1173
|
+
|
1174
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_header_map'>parse_header_map</span> <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
|
1175
|
+
<span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
|
1176
|
+
<span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1177
|
+
<span class='label'>selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1178
|
+
<span class='label'>column_key_label_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
|
1179
|
+
<span class='label'>first_row_header:</span> <span class='kw'>false</span>
|
1180
|
+
<span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
|
1181
|
+
|
1182
|
+
<span class='comment'># Setup config
|
1183
|
+
</span> <span class='id identifier rubyid_dictionary'>dictionary</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_key_label_map</span><span class='rbracket'>]</span>
|
1184
|
+
<span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='lbracket'>[</span><span class='rbracket'>]</span>
|
1185
|
+
<span class='id identifier rubyid_column_map'>column_map</span> <span class='op'>=</span> <span class='kw'>nil</span>
|
1186
|
+
|
1187
|
+
<span class='comment'># Extract and parse header rows
|
1188
|
+
</span> <span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:selector</span><span class='rbracket'>]</span><span class='rparen'>)</span> <span class='kw'>rescue</span> <span class='kw'>nil</span>
|
1189
|
+
<span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
|
1190
|
+
<span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='lbracket'>[</span><span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_first'>first</span><span class='rbracket'>]</span> <span class='kw'>if</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span>
|
1191
|
+
<span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_row'>row</span><span class='op'>|</span>
|
1192
|
+
<span class='id identifier rubyid_column_map'>column_map</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
|
1193
|
+
<span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_children'>children</span><span class='period'>.</span><span class='id identifier rubyid_each_with_index'>each_with_index</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_col'>col</span><span class='comma'>,</span> <span class='id identifier rubyid_index'>index</span><span class='op'>|</span>
|
1194
|
+
<span class='comment'># Parse and map column header
|
1195
|
+
</span> <span class='id identifier rubyid_column_key'>column_key</span> <span class='op'>=</span> <span class='id identifier rubyid_translate_label_to_key'>translate_label_to_key</span> <span class='id identifier rubyid_col'>col</span><span class='comma'>,</span> <span class='id identifier rubyid_dictionary'>dictionary</span>
|
1196
|
+
<span class='kw'>next</span> <span class='kw'>if</span> <span class='id identifier rubyid_column_key'>column_key</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
|
1197
|
+
<span class='id identifier rubyid_column_map'>column_map</span><span class='lbracket'>[</span><span class='id identifier rubyid_column_key'>column_key</span><span class='rbracket'>]</span> <span class='op'>=</span> <span class='id identifier rubyid_index'>index</span>
|
1198
|
+
<span class='kw'>end</span>
|
1199
|
+
<span class='id identifier rubyid_data'>data</span> <span class='op'><<</span> <span class='id identifier rubyid_column_map'>column_map</span>
|
1200
|
+
<span class='kw'>end</span>
|
1201
|
+
<span class='id identifier rubyid_data'>data</span><span class='op'>&.</span><span class='id identifier rubyid_first'>first</span>
|
1202
|
+
<span class='kw'>end</span></pre>
|
1203
|
+
</td>
|
1204
|
+
</tr>
|
1205
|
+
</table>
|
1206
|
+
</div>
|
1207
|
+
|
1208
|
+
<div class="method_details ">
|
1209
|
+
<h3 class="signature " id="parse_table-class_method">
|
1210
|
+
|
1211
|
+
.<strong>parse_table</strong>(opts = {}) {|data, row, header_map| ... } ⇒ <tt>Hash{Symbol => Array,Hash,nil}</tt>
|
1212
|
+
|
1213
|
+
|
1214
|
+
|
1215
|
+
|
1216
|
+
|
1217
|
+
</h3><div class="docstring">
|
1218
|
+
<div class="discussion">
|
1219
|
+
|
1220
|
+
<p>Parse data from a horizontal table like structure matching a selectors and</p>
|
1221
|
+
|
1222
|
+
<pre class="code ruby"><code class="ruby">using a header map to match columns.
|
1223
|
+
</code></pre>
|
1224
|
+
|
1225
|
+
|
1226
|
+
</div>
|
1227
|
+
</div>
|
1228
|
+
<div class="tags">
|
1229
|
+
<p class="tag_title">Parameters:</p>
|
1230
|
+
<ul class="param">
|
1231
|
+
|
1232
|
+
<li>
|
1233
|
+
|
1234
|
+
<span class='name'>opts</span>
|
1235
|
+
|
1236
|
+
|
1237
|
+
<span class='type'>(<tt>Hash</tt>)</span>
|
1238
|
+
|
1239
|
+
|
1240
|
+
<em class="default">(defaults to: <tt>{}</tt>)</em>
|
1241
|
+
|
1242
|
+
|
1243
|
+
—
|
1244
|
+
<div class='inline'>
|
1245
|
+
<p>({}) Configuration options.</p>
|
1246
|
+
</div>
|
1247
|
+
|
1248
|
+
</li>
|
1249
|
+
|
1250
|
+
</ul>
|
1251
|
+
|
1252
|
+
|
1253
|
+
|
1254
|
+
|
1255
|
+
<p class="tag_title">Options Hash (<tt>opts</tt>):</p>
|
1256
|
+
<ul class="option">
|
1257
|
+
|
1258
|
+
<li>
|
1259
|
+
<span class="name">:html</span>
|
1260
|
+
<span class="type">(<tt>Nokogiri::Element</tt>)</span>
|
1261
|
+
<span class="default">
|
1262
|
+
|
1263
|
+
</span>
|
1264
|
+
|
1265
|
+
— <div class='inline'>
|
1266
|
+
<p>Container element to search into.</p>
|
1267
|
+
</div>
|
1268
|
+
|
1269
|
+
</li>
|
1270
|
+
|
1271
|
+
<li>
|
1272
|
+
<span class="name">:header_selector</span>
|
1273
|
+
<span class="type">(<tt>String</tt>)</span>
|
1274
|
+
<span class="default">
|
1275
|
+
|
1276
|
+
</span>
|
1277
|
+
|
1278
|
+
— <div class='inline'>
|
1279
|
+
<p>Header column elements selector.</p>
|
1280
|
+
</div>
|
1281
|
+
|
1282
|
+
</li>
|
1283
|
+
|
1284
|
+
<li>
|
1285
|
+
<span class="name">:header_key_label_map</span>
|
1286
|
+
<span class="type">(<tt>Hash{Symbol,String => Regex,String}</tt>)</span>
|
1287
|
+
<span class="default">
|
1288
|
+
|
1289
|
+
</span>
|
1290
|
+
|
1291
|
+
— <div class='inline'>
|
1292
|
+
<p>Header key vs. label dictionary to match column indexes.</p>
|
1293
|
+
</div>
|
1294
|
+
|
1295
|
+
</li>
|
1296
|
+
|
1297
|
+
<li>
|
1298
|
+
<span class="name">:content_selector</span>
|
1299
|
+
<span class="type">(<tt>String</tt>)</span>
|
1300
|
+
<span class="default">
|
1301
|
+
|
1302
|
+
</span>
|
1303
|
+
|
1304
|
+
— <div class='inline'>
|
1305
|
+
<p>Content row elements selector.</p>
|
1306
|
+
</div>
|
1307
|
+
|
1308
|
+
</li>
|
1309
|
+
|
1310
|
+
<li>
|
1311
|
+
<span class="name">:first_row_header</span>
|
1312
|
+
<span class="type">(<tt>Boolean</tt>)</span>
|
1313
|
+
<span class="default">
|
1314
|
+
|
1315
|
+
— default:
|
1316
|
+
<tt>false</tt>
|
1317
|
+
|
1318
|
+
</span>
|
1319
|
+
|
1320
|
+
— <div class='inline'>
|
1321
|
+
<p>If true then selector first matching row will be used as header for
|
1322
|
+
parsing.</p>
|
1323
|
+
</div>
|
1324
|
+
|
1325
|
+
</li>
|
1326
|
+
|
1327
|
+
<li>
|
1328
|
+
<span class="name">:column_parsers</span>
|
1329
|
+
<span class="type">(<tt>Hash{Symbol,String => lambda,proc}</tt>)</span>
|
1330
|
+
<span class="default">
|
1331
|
+
|
1332
|
+
— default:
|
1333
|
+
<tt>{}</tt>
|
1334
|
+
|
1335
|
+
</span>
|
1336
|
+
|
1337
|
+
— <div class='inline'>
|
1338
|
+
<p>Custom column parsers for advance data extraction.</p>
|
1339
|
+
</div>
|
1340
|
+
|
1341
|
+
</li>
|
1342
|
+
|
1343
|
+
</ul>
|
1344
|
+
|
1345
|
+
|
1346
|
+
<p class="tag_title">Yield Parameters:</p>
|
1347
|
+
<ul class="yieldparam">
|
1348
|
+
|
1349
|
+
<li>
|
1350
|
+
|
1351
|
+
<span class='name'>data</span>
|
1352
|
+
|
1353
|
+
|
1354
|
+
<span class='type'>(<tt>Hash{Symbol,String => Object}</tt>)</span>
|
1355
|
+
|
1356
|
+
|
1357
|
+
|
1358
|
+
—
|
1359
|
+
<div class='inline'>
|
1360
|
+
<p>Parsed content row data.</p>
|
1361
|
+
</div>
|
1362
|
+
|
1363
|
+
</li>
|
1364
|
+
|
1365
|
+
<li>
|
1366
|
+
|
1367
|
+
<span class='name'>row</span>
|
1368
|
+
|
1369
|
+
|
1370
|
+
<span class='type'>(<tt>Array</tt>)</span>
|
1371
|
+
|
1372
|
+
|
1373
|
+
|
1374
|
+
—
|
1375
|
+
<div class='inline'>
|
1376
|
+
<p>Raw content row data.</p>
|
1377
|
+
</div>
|
1378
|
+
|
1379
|
+
</li>
|
1380
|
+
|
1381
|
+
<li>
|
1382
|
+
|
1383
|
+
<span class='name'>header_map</span>
|
1384
|
+
|
1385
|
+
|
1386
|
+
<span class='type'>(<tt>Hash{Symbol,String => Integer}</tt>)</span>
|
1387
|
+
|
1388
|
+
|
1389
|
+
|
1390
|
+
—
|
1391
|
+
<div class='inline'>
|
1392
|
+
<p>Header map used.</p>
|
1393
|
+
</div>
|
1394
|
+
|
1395
|
+
</li>
|
1396
|
+
|
1397
|
+
</ul>
|
1398
|
+
<p class="tag_title">Yield Returns:</p>
|
1399
|
+
<ul class="yieldreturn">
|
1400
|
+
|
1401
|
+
<li>
|
1402
|
+
|
1403
|
+
|
1404
|
+
<span class='type'>(<tt>Boolean</tt>)</span>
|
1405
|
+
|
1406
|
+
|
1407
|
+
|
1408
|
+
—
|
1409
|
+
<div class='inline'>
|
1410
|
+
<p>`true` when valid, else `false`.</p>
|
1411
|
+
</div>
|
1412
|
+
|
1413
|
+
</li>
|
1414
|
+
|
1415
|
+
</ul>
|
1416
|
+
<p class="tag_title">Returns:</p>
|
1417
|
+
<ul class="return">
|
1418
|
+
|
1419
|
+
<li>
|
1420
|
+
|
1421
|
+
|
1422
|
+
<span class='type'>(<tt>Hash{Symbol => Array,Hash,nil}</tt>)</span>
|
1423
|
+
|
1424
|
+
|
1425
|
+
|
1426
|
+
—
|
1427
|
+
<div class='inline'>
|
1428
|
+
<p>Hash data is as follows:</p>
|
1429
|
+
<ul><li>
|
1430
|
+
<p>`[Hash] :header_map` Header map used.</p>
|
1431
|
+
</li><li>
|
1432
|
+
<p>`[Array<Hash>,nil] :data` Parsed rows data.</p>
|
1433
|
+
</li></ul>
|
1434
|
+
</div>
|
1435
|
+
|
1436
|
+
</li>
|
1437
|
+
|
1438
|
+
</ul>
|
1439
|
+
|
1440
|
+
</div><table class="source_code">
|
1441
|
+
<tr>
|
1442
|
+
<td>
|
1443
|
+
<pre class="lines">
|
1444
|
+
|
1445
|
+
|
1446
|
+
204
|
1447
|
+
205
|
1448
|
+
206
|
1449
|
+
207
|
1450
|
+
208
|
1451
|
+
209
|
1452
|
+
210
|
1453
|
+
211
|
1454
|
+
212
|
1455
|
+
213
|
1456
|
+
214
|
1457
|
+
215
|
1458
|
+
216
|
1459
|
+
217
|
1460
|
+
218
|
1461
|
+
219
|
1462
|
+
220
|
1463
|
+
221
|
1464
|
+
222
|
1465
|
+
223
|
1466
|
+
224
|
1467
|
+
225
|
1468
|
+
226</pre>
|
1469
|
+
</td>
|
1470
|
+
<td>
|
1471
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 204</span>
|
1472
|
+
|
1473
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_table'>parse_table</span> <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span> <span class='op'>&</span><span class='id identifier rubyid_filter'>filter</span>
|
1474
|
+
<span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
|
1475
|
+
<span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1476
|
+
<span class='label'>header_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1477
|
+
<span class='label'>header_key_label_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
|
1478
|
+
<span class='label'>content_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1479
|
+
<span class='label'>first_row_header:</span> <span class='kw'>false</span><span class='comma'>,</span>
|
1480
|
+
<span class='label'>column_parsers:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
|
1481
|
+
<span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
|
1482
|
+
<span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
|
1483
|
+
<span class='id identifier rubyid_header_map'>header_map</span> <span class='op'>=</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_header_map'>parse_header_map</span> <span class='label'>html:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='comma'>,</span>
|
1484
|
+
<span class='label'>selector:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_selector</span><span class='rbracket'>]</span><span class='comma'>,</span>
|
1485
|
+
<span class='label'>column_key_label_map:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_key_label_map</span><span class='rbracket'>]</span><span class='comma'>,</span>
|
1486
|
+
<span class='label'>first_row_header:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span>
|
1487
|
+
<span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
|
1488
|
+
<span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_content'>parse_content</span> <span class='label'>html:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='comma'>,</span>
|
1489
|
+
<span class='label'>selector:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:content_selector</span><span class='rbracket'>]</span><span class='comma'>,</span>
|
1490
|
+
<span class='label'>header_map:</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='comma'>,</span>
|
1491
|
+
<span class='label'>first_row_header:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:first_row_header</span><span class='rbracket'>]</span><span class='comma'>,</span>
|
1492
|
+
<span class='label'>column_parsers:</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_parsers</span><span class='rbracket'>]</span><span class='comma'>,</span>
|
1493
|
+
<span class='op'>&</span><span class='id identifier rubyid_filter'>filter</span>
|
1494
|
+
<span class='lbrace'>{</span><span class='label'>header_map:</span> <span class='id identifier rubyid_header_map'>header_map</span><span class='comma'>,</span> <span class='label'>data:</span> <span class='id identifier rubyid_data'>data</span><span class='rbrace'>}</span>
|
1495
|
+
<span class='kw'>end</span></pre>
|
1496
|
+
</td>
|
1497
|
+
</tr>
|
1498
|
+
</table>
|
1499
|
+
</div>
|
1500
|
+
|
1501
|
+
<div class="method_details ">
|
1502
|
+
<h3 class="signature " id="parse_vertical_table-class_method">
|
1503
|
+
|
1504
|
+
.<strong>parse_vertical_table</strong>(opts = {}) {|data, row, header_map| ... } ⇒ <tt>Hash{Symbol => Array,Hash,nil}</tt>
|
1505
|
+
|
1506
|
+
|
1507
|
+
|
1508
|
+
|
1509
|
+
|
1510
|
+
</h3><div class="docstring">
|
1511
|
+
<div class="discussion">
|
1512
|
+
|
1513
|
+
<p>Parse data from a vertical table like structure matching a selectors and</p>
|
1514
|
+
|
1515
|
+
<pre class="code ruby"><code class="ruby">using a header map to match columns.
|
1516
|
+
</code></pre>
|
1517
|
+
|
1518
|
+
|
1519
|
+
</div>
|
1520
|
+
</div>
|
1521
|
+
<div class="tags">
|
1522
|
+
<p class="tag_title">Parameters:</p>
|
1523
|
+
<ul class="param">
|
1524
|
+
|
1525
|
+
<li>
|
1526
|
+
|
1527
|
+
<span class='name'>opts</span>
|
1528
|
+
|
1529
|
+
|
1530
|
+
<span class='type'>(<tt>Hash</tt>)</span>
|
1531
|
+
|
1532
|
+
|
1533
|
+
<em class="default">(defaults to: <tt>{}</tt>)</em>
|
1534
|
+
|
1535
|
+
|
1536
|
+
—
|
1537
|
+
<div class='inline'>
|
1538
|
+
<p>({}) Configuration options.</p>
|
1539
|
+
</div>
|
1540
|
+
|
1541
|
+
</li>
|
1542
|
+
|
1543
|
+
</ul>
|
1544
|
+
|
1545
|
+
|
1546
|
+
|
1547
|
+
|
1548
|
+
<p class="tag_title">Options Hash (<tt>opts</tt>):</p>
|
1549
|
+
<ul class="option">
|
1550
|
+
|
1551
|
+
<li>
|
1552
|
+
<span class="name">:html</span>
|
1553
|
+
<span class="type">(<tt>Nokogiri::Element</tt>)</span>
|
1554
|
+
<span class="default">
|
1555
|
+
|
1556
|
+
</span>
|
1557
|
+
|
1558
|
+
— <div class='inline'>
|
1559
|
+
<p>Container element to search into.</p>
|
1560
|
+
</div>
|
1561
|
+
|
1562
|
+
</li>
|
1563
|
+
|
1564
|
+
<li>
|
1565
|
+
<span class="name">:row_selector</span>
|
1566
|
+
<span class="type">(<tt>String</tt>)</span>
|
1567
|
+
<span class="default">
|
1568
|
+
|
1569
|
+
</span>
|
1570
|
+
|
1571
|
+
— <div class='inline'>
|
1572
|
+
<p>Vertical row like elements selector.</p>
|
1573
|
+
</div>
|
1574
|
+
|
1575
|
+
</li>
|
1576
|
+
|
1577
|
+
<li>
|
1578
|
+
<span class="name">:header_selector</span>
|
1579
|
+
<span class="type">(<tt>String</tt>)</span>
|
1580
|
+
<span class="default">
|
1581
|
+
|
1582
|
+
</span>
|
1583
|
+
|
1584
|
+
— <div class='inline'>
|
1585
|
+
<p>Header column elements selector.</p>
|
1586
|
+
</div>
|
1587
|
+
|
1588
|
+
</li>
|
1589
|
+
|
1590
|
+
<li>
|
1591
|
+
<span class="name">:header_key_label_map</span>
|
1592
|
+
<span class="type">(<tt>Hash{Symbol,String => Regex,String}</tt>)</span>
|
1593
|
+
<span class="default">
|
1594
|
+
|
1595
|
+
</span>
|
1596
|
+
|
1597
|
+
— <div class='inline'>
|
1598
|
+
<p>Header key vs. label dictionary to match column indexes.</p>
|
1599
|
+
</div>
|
1600
|
+
|
1601
|
+
</li>
|
1602
|
+
|
1603
|
+
<li>
|
1604
|
+
<span class="name">:content_selector</span>
|
1605
|
+
<span class="type">(<tt>String</tt>)</span>
|
1606
|
+
<span class="default">
|
1607
|
+
|
1608
|
+
</span>
|
1609
|
+
|
1610
|
+
— <div class='inline'>
|
1611
|
+
<p>Content row elements selector.</p>
|
1612
|
+
</div>
|
1613
|
+
|
1614
|
+
</li>
|
1615
|
+
|
1616
|
+
<li>
|
1617
|
+
<span class="name">:column_parsers</span>
|
1618
|
+
<span class="type">(<tt>Hash{Symbol,String => lambda,proc}</tt>)</span>
|
1619
|
+
<span class="default">
|
1620
|
+
|
1621
|
+
— default:
|
1622
|
+
<tt>{}</tt>
|
1623
|
+
|
1624
|
+
</span>
|
1625
|
+
|
1626
|
+
— <div class='inline'>
|
1627
|
+
<p>Custom column parsers for advance data extraction.</p>
|
1628
|
+
</div>
|
1629
|
+
|
1630
|
+
</li>
|
1631
|
+
|
1632
|
+
</ul>
|
1633
|
+
|
1634
|
+
|
1635
|
+
<p class="tag_title">Yield Parameters:</p>
|
1636
|
+
<ul class="yieldparam">
|
1637
|
+
|
1638
|
+
<li>
|
1639
|
+
|
1640
|
+
<span class='name'>data</span>
|
1641
|
+
|
1642
|
+
|
1643
|
+
<span class='type'>(<tt>Hash{Symbol,String => Object}</tt>)</span>
|
1644
|
+
|
1645
|
+
|
1646
|
+
|
1647
|
+
—
|
1648
|
+
<div class='inline'>
|
1649
|
+
<p>Parsed content row data.</p>
|
1650
|
+
</div>
|
1651
|
+
|
1652
|
+
</li>
|
1653
|
+
|
1654
|
+
<li>
|
1655
|
+
|
1656
|
+
<span class='name'>row</span>
|
1657
|
+
|
1658
|
+
|
1659
|
+
<span class='type'>(<tt>Array</tt>)</span>
|
1660
|
+
|
1661
|
+
|
1662
|
+
|
1663
|
+
—
|
1664
|
+
<div class='inline'>
|
1665
|
+
<p>Raw content row data.</p>
|
1666
|
+
</div>
|
1667
|
+
|
1668
|
+
</li>
|
1669
|
+
|
1670
|
+
<li>
|
1671
|
+
|
1672
|
+
<span class='name'>header_map</span>
|
1673
|
+
|
1674
|
+
|
1675
|
+
<span class='type'>(<tt>Hash{Symbol,String => Integer}</tt>)</span>
|
1676
|
+
|
1677
|
+
|
1678
|
+
|
1679
|
+
—
|
1680
|
+
<div class='inline'>
|
1681
|
+
<p>Header map used.</p>
|
1682
|
+
</div>
|
1683
|
+
|
1684
|
+
</li>
|
1685
|
+
|
1686
|
+
</ul>
|
1687
|
+
<p class="tag_title">Yield Returns:</p>
|
1688
|
+
<ul class="yieldreturn">
|
1689
|
+
|
1690
|
+
<li>
|
1691
|
+
|
1692
|
+
|
1693
|
+
<span class='type'>(<tt>Boolean</tt>)</span>
|
1694
|
+
|
1695
|
+
|
1696
|
+
|
1697
|
+
—
|
1698
|
+
<div class='inline'>
|
1699
|
+
<p>`true` when valid, else `false`.</p>
|
1700
|
+
</div>
|
1701
|
+
|
1702
|
+
</li>
|
1703
|
+
|
1704
|
+
</ul>
|
1705
|
+
<p class="tag_title">Returns:</p>
|
1706
|
+
<ul class="return">
|
1707
|
+
|
1708
|
+
<li>
|
1709
|
+
|
1710
|
+
|
1711
|
+
<span class='type'>(<tt>Hash{Symbol => Array,Hash,nil}</tt>)</span>
|
1712
|
+
|
1713
|
+
|
1714
|
+
|
1715
|
+
—
|
1716
|
+
<div class='inline'>
|
1717
|
+
<p>Hash data is as follows:</p>
|
1718
|
+
<ul><li>
|
1719
|
+
<p>`[Hash] :header_map` Header map used.</p>
|
1720
|
+
</li><li>
|
1721
|
+
<p>`[Array<Hash>,nil] :data` Parsed rows data.</p>
|
1722
|
+
</li></ul>
|
1723
|
+
</div>
|
1724
|
+
|
1725
|
+
</li>
|
1726
|
+
|
1727
|
+
</ul>
|
1728
|
+
|
1729
|
+
</div><table class="source_code">
|
1730
|
+
<tr>
|
1731
|
+
<td>
|
1732
|
+
<pre class="lines">
|
1733
|
+
|
1734
|
+
|
1735
|
+
249
|
1736
|
+
250
|
1737
|
+
251
|
1738
|
+
252
|
1739
|
+
253
|
1740
|
+
254
|
1741
|
+
255
|
1742
|
+
256
|
1743
|
+
257
|
1744
|
+
258
|
1745
|
+
259
|
1746
|
+
260
|
1747
|
+
261
|
1748
|
+
262
|
1749
|
+
263
|
1750
|
+
264
|
1751
|
+
265
|
1752
|
+
266
|
1753
|
+
267
|
1754
|
+
268
|
1755
|
+
269
|
1756
|
+
270
|
1757
|
+
271
|
1758
|
+
272
|
1759
|
+
273
|
1760
|
+
274
|
1761
|
+
275
|
1762
|
+
276
|
1763
|
+
277
|
1764
|
+
278
|
1765
|
+
279
|
1766
|
+
280
|
1767
|
+
281</pre>
|
1768
|
+
</td>
|
1769
|
+
<td>
|
1770
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 249</span>
|
1771
|
+
|
1772
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_parse_vertical_table'>parse_vertical_table</span> <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span> <span class='op'>&</span><span class='id identifier rubyid_filter'>filter</span>
|
1773
|
+
<span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span>
|
1774
|
+
<span class='label'>html:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1775
|
+
<span class='label'>row_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1776
|
+
<span class='label'>header_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1777
|
+
<span class='label'>header_key_label_map:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span><span class='comma'>,</span>
|
1778
|
+
<span class='label'>content_selector:</span> <span class='kw'>nil</span><span class='comma'>,</span>
|
1779
|
+
<span class='label'>column_parsers:</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
|
1780
|
+
<span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
|
1781
|
+
<span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
|
1782
|
+
|
1783
|
+
<span class='comment'># Setup config
|
1784
|
+
</span> <span class='id identifier rubyid_data'>data</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
|
1785
|
+
<span class='id identifier rubyid_dictionary'>dictionary</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_key_label_map</span><span class='rbracket'>]</span>
|
1786
|
+
<span class='id identifier rubyid_column_parsers'>column_parsers</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:column_parsers</span><span class='rbracket'>]</span>
|
1787
|
+
|
1788
|
+
<span class='comment'># Extract headers and content
|
1789
|
+
</span> <span class='id identifier rubyid_html_rows'>html_rows</span> <span class='op'>=</span> <span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:html</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:row_selector</span><span class='rbracket'>]</span><span class='rparen'>)</span> <span class='kw'>rescue</span> <span class='kw'>nil</span>
|
1790
|
+
<span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
|
1791
|
+
<span class='id identifier rubyid_html_rows'>html_rows</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_row'>row</span><span class='op'>|</span>
|
1792
|
+
<span class='comment'># Parse and map column header
|
1793
|
+
</span> <span class='id identifier rubyid_header_element'>header_element</span> <span class='op'>=</span> <span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:header_selector</span><span class='rbracket'>]</span><span class='rparen'>)</span>
|
1794
|
+
<span class='id identifier rubyid_key'>key</span> <span class='op'>=</span> <span class='id identifier rubyid_translate_label_to_key'>translate_label_to_key</span> <span class='id identifier rubyid_header_element'>header_element</span><span class='comma'>,</span> <span class='id identifier rubyid_dictionary'>dictionary</span>
|
1795
|
+
<span class='kw'>next</span> <span class='kw'>if</span> <span class='id identifier rubyid_key'>key</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>||</span> <span class='id identifier rubyid_key'>key</span> <span class='op'>==</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_end'>'</span></span>
|
1796
|
+
|
1797
|
+
<span class='comment'># Parse column html with default or custom parser
|
1798
|
+
</span> <span class='id identifier rubyid_content_element'>content_element</span> <span class='op'>=</span> <span class='id identifier rubyid_row'>row</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='lbracket'>[</span><span class='symbol'>:content_selector</span><span class='rbracket'>]</span><span class='rparen'>)</span>
|
1799
|
+
<span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>?</span>
|
1800
|
+
<span class='id identifier rubyid_default_parser'>default_parser</span><span class='lparen'>(</span><span class='id identifier rubyid_content_element'>content_element</span><span class='comma'>,</span> <span class='id identifier rubyid_data'>data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span> <span class='op'>:</span>
|
1801
|
+
<span class='id identifier rubyid_column_parsers'>column_parsers</span><span class='lbracket'>[</span><span class='id identifier rubyid_key'>key</span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_call'>call</span><span class='lparen'>(</span><span class='id identifier rubyid_content_element'>content_element</span><span class='comma'>,</span> <span class='id identifier rubyid_data'>data</span><span class='comma'>,</span> <span class='id identifier rubyid_key'>key</span><span class='rparen'>)</span>
|
1802
|
+
<span class='kw'>end</span>
|
1803
|
+
<span class='id identifier rubyid_data'>data</span>
|
1804
|
+
<span class='kw'>end</span></pre>
|
1805
|
+
</td>
|
1806
|
+
</tr>
|
1807
|
+
</table>
|
1808
|
+
</div>
|
1809
|
+
|
1810
|
+
<div class="method_details ">
|
1811
|
+
<h3 class="signature " id="strip-class_method">
|
1812
|
+
|
1813
|
+
.<strong>strip</strong>(raw_text) ⇒ <tt>String</tt><sup>?</sup>
|
1814
|
+
|
1815
|
+
|
1816
|
+
|
1817
|
+
|
1818
|
+
|
1819
|
+
</h3><div class="docstring">
|
1820
|
+
<div class="discussion">
|
1821
|
+
|
1822
|
+
<p>Strip a value.</p>
|
1823
|
+
|
1824
|
+
|
1825
|
+
</div>
|
1826
|
+
</div>
|
1827
|
+
<div class="tags">
|
1828
|
+
<p class="tag_title">Parameters:</p>
|
1829
|
+
<ul class="param">
|
1830
|
+
|
1831
|
+
<li>
|
1832
|
+
|
1833
|
+
<span class='name'>raw_text</span>
|
1834
|
+
|
1835
|
+
|
1836
|
+
<span class='type'>(<tt>String</tt>, <tt>Object</tt>, <tt>nil</tt>)</span>
|
1837
|
+
|
1838
|
+
|
1839
|
+
|
1840
|
+
—
|
1841
|
+
<div class='inline'>
|
1842
|
+
<p>Text to strip.</p>
|
1843
|
+
</div>
|
1844
|
+
|
1845
|
+
</li>
|
1846
|
+
|
1847
|
+
</ul>
|
1848
|
+
|
1849
|
+
<p class="tag_title">Returns:</p>
|
1850
|
+
<ul class="return">
|
1851
|
+
|
1852
|
+
<li>
|
1853
|
+
|
1854
|
+
|
1855
|
+
<span class='type'>(<tt>String</tt>, <tt>nil</tt>)</span>
|
1856
|
+
|
1857
|
+
|
1858
|
+
|
1859
|
+
—
|
1860
|
+
<div class='inline'>
|
1861
|
+
<p>`nil` when <code>raw_text</code> is nil, else `String`.</p>
|
1862
|
+
</div>
|
1863
|
+
|
1864
|
+
</li>
|
1865
|
+
|
1866
|
+
</ul>
|
1867
|
+
|
1868
|
+
</div><table class="source_code">
|
1869
|
+
<tr>
|
1870
|
+
<td>
|
1871
|
+
<pre class="lines">
|
1872
|
+
|
1873
|
+
|
1874
|
+
42
|
1875
|
+
43
|
1876
|
+
44
|
1877
|
+
45
|
1878
|
+
46
|
1879
|
+
47
|
1880
|
+
48
|
1881
|
+
49
|
1882
|
+
50
|
1883
|
+
51
|
1884
|
+
52
|
1885
|
+
53</pre>
|
1886
|
+
</td>
|
1887
|
+
<td>
|
1888
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 42</span>
|
1889
|
+
|
1890
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_strip'>strip</span> <span class='id identifier rubyid_raw_text'>raw_text</span>
|
1891
|
+
<span class='kw'>return</span> <span class='kw'>nil</span> <span class='kw'>if</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
|
1892
|
+
<span class='id identifier rubyid_raw_text'>raw_text</span> <span class='op'>=</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span> <span class='kw'>unless</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_is_a?'>is_a?</span> <span class='const'>String</span>
|
1893
|
+
<span class='id identifier rubyid_regex'>regex</span> <span class='op'>=</span> <span class='tstring'><span class='regexp_beg'>/</span><span class='tstring_content'>(\s|\u3000|\u00a0)+</span><span class='regexp_end'>/</span></span>
|
1894
|
+
<span class='id identifier rubyid_good_encoding'>good_encoding</span> <span class='op'>=</span> <span class='lparen'>(</span><span class='id identifier rubyid_raw_text'>raw_text</span> <span class='op'>=~</span> <span class='tstring'><span class='regexp_beg'>/</span><span class='tstring_content'>\u3000</span><span class='regexp_end'>/</span></span> <span class='op'>||</span> <span class='kw'>true</span><span class='rparen'>)</span> <span class='kw'>rescue</span> <span class='kw'>false</span>
|
1895
|
+
<span class='kw'>unless</span> <span class='id identifier rubyid_good_encoding'>good_encoding</span>
|
1896
|
+
<span class='id identifier rubyid_raw_text'>raw_text</span> <span class='op'>=</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='period'>.</span><span class='id identifier rubyid_force_encoding'>force_encoding</span><span class='lparen'>(</span><span class='gvar'>$APP_CONFIG</span><span class='lbracket'>[</span><span class='symbol'>:encoding</span><span class='rbracket'>]</span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_encode'>encode</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>UTF-8</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span>
|
1897
|
+
<span class='id identifier rubyid_regex'>regex</span> <span class='op'>=</span> <span class='tstring'><span class='regexp_beg'>/</span><span class='tstring_content'>(\s|\u3000|\u00a0|\u00c2\u00a0)+</span><span class='regexp_end'>/</span></span>
|
1898
|
+
<span class='kw'>end</span>
|
1899
|
+
<span class='id identifier rubyid_text'>text</span> <span class='op'>=</span> <span class='id identifier rubyid_raw_text'>raw_text</span><span class='op'>&.</span><span class='id identifier rubyid_gsub'>gsub</span><span class='lparen'>(</span><span class='id identifier rubyid_regex'>regex</span><span class='comma'>,</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'> </span><span class='tstring_end'>'</span></span><span class='rparen'>)</span><span class='op'>&.</span><span class='id identifier rubyid_strip'>strip</span>
|
1900
|
+
<span class='id identifier rubyid_text'>text</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span> <span class='op'>?</span> <span class='kw'>nil</span> <span class='op'>:</span> <span class='id identifier rubyid_decode_html'>decode_html</span><span class='lparen'>(</span><span class='id identifier rubyid_text'>text</span><span class='rparen'>)</span>
|
1901
|
+
<span class='kw'>end</span></pre>
|
1902
|
+
</td>
|
1903
|
+
</tr>
|
1904
|
+
</table>
|
1905
|
+
</div>
|
1906
|
+
|
1907
|
+
<div class="method_details ">
|
1908
|
+
<h3 class="signature " id="translate_label_to_key-class_method">
|
1909
|
+
|
1910
|
+
.<strong>translate_label_to_key</strong>(element, label_map) ⇒ <tt>Symbol</tt>, <tt>String</tt>
|
1911
|
+
|
1912
|
+
|
1913
|
+
|
1914
|
+
|
1915
|
+
|
1916
|
+
</h3><div class="docstring">
|
1917
|
+
<div class="discussion">
|
1918
|
+
|
1919
|
+
<p>Extract column label and translate it into a frienly key.</p>
|
1920
|
+
|
1921
|
+
|
1922
|
+
</div>
|
1923
|
+
</div>
|
1924
|
+
<div class="tags">
|
1925
|
+
<p class="tag_title">Parameters:</p>
|
1926
|
+
<ul class="param">
|
1927
|
+
|
1928
|
+
<li>
|
1929
|
+
|
1930
|
+
<span class='name'>element</span>
|
1931
|
+
|
1932
|
+
|
1933
|
+
<span class='type'>(<tt>Nokogiri::Element</tt>)</span>
|
1934
|
+
|
1935
|
+
|
1936
|
+
|
1937
|
+
—
|
1938
|
+
<div class='inline'>
|
1939
|
+
<p>Html element to parse.</p>
|
1940
|
+
</div>
|
1941
|
+
|
1942
|
+
</li>
|
1943
|
+
|
1944
|
+
<li>
|
1945
|
+
|
1946
|
+
<span class='name'>label_map</span>
|
1947
|
+
|
1948
|
+
|
1949
|
+
<span class='type'>(<tt>Hash{Symbol,String => Regex,String}</tt>)</span>
|
1950
|
+
|
1951
|
+
|
1952
|
+
|
1953
|
+
—
|
1954
|
+
<div class='inline'>
|
1955
|
+
<p>Label dictionary for translation into key.</p>
|
1956
|
+
</div>
|
1957
|
+
|
1958
|
+
</li>
|
1959
|
+
|
1960
|
+
</ul>
|
1961
|
+
|
1962
|
+
<p class="tag_title">Returns:</p>
|
1963
|
+
<ul class="return">
|
1964
|
+
|
1965
|
+
<li>
|
1966
|
+
|
1967
|
+
|
1968
|
+
<span class='type'>(<tt>Symbol</tt>, <tt>String</tt>)</span>
|
1969
|
+
|
1970
|
+
|
1971
|
+
|
1972
|
+
—
|
1973
|
+
<div class='inline'>
|
1974
|
+
<p>Translated key.</p>
|
1975
|
+
</div>
|
1976
|
+
|
1977
|
+
</li>
|
1978
|
+
|
1979
|
+
</ul>
|
1980
|
+
|
1981
|
+
</div><table class="source_code">
|
1982
|
+
<tr>
|
1983
|
+
<td>
|
1984
|
+
<pre class="lines">
|
1985
|
+
|
1986
|
+
|
1987
|
+
131
|
1988
|
+
132
|
1989
|
+
133
|
1990
|
+
134
|
1991
|
+
135
|
1992
|
+
136
|
1993
|
+
137
|
1994
|
+
138</pre>
|
1995
|
+
</td>
|
1996
|
+
<td>
|
1997
|
+
<pre class="code"><span class="info file"># File 'lib/ae_easy/text.rb', line 131</span>
|
1998
|
+
|
1999
|
+
<span class='kw'>def</span> <span class='kw'>self</span><span class='period'>.</span><span class='id identifier rubyid_translate_label_to_key'>translate_label_to_key</span> <span class='id identifier rubyid_element'>element</span><span class='comma'>,</span> <span class='id identifier rubyid_label_map'>label_map</span>
|
2000
|
+
<span class='id identifier rubyid_element'>element</span><span class='op'>&.</span><span class='id identifier rubyid_search'>search</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>//i</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_remove'>remove</span>
|
2001
|
+
<span class='id identifier rubyid_text'>text</span> <span class='op'>=</span> <span class='id identifier rubyid_strip'>strip</span> <span class='id identifier rubyid_element'>element</span><span class='op'>&.</span><span class='id identifier rubyid_text'>text</span>
|
2002
|
+
<span class='id identifier rubyid_key'>key</span> <span class='op'>=</span> <span class='id identifier rubyid_label_map'>label_map</span><span class='period'>.</span><span class='id identifier rubyid_find'>find</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_k'>k</span><span class='comma'>,</span><span class='id identifier rubyid_v'>v</span><span class='op'>|</span>
|
2003
|
+
<span class='id identifier rubyid_v'>v</span><span class='period'>.</span><span class='id identifier rubyid_is_a?'>is_a?</span><span class='lparen'>(</span><span class='const'>Regexp</span><span class='rparen'>)</span> <span class='op'>?</span> <span class='lparen'>(</span><span class='id identifier rubyid_text'>text</span> <span class='op'>=~</span> <span class='id identifier rubyid_v'>v</span><span class='rparen'>)</span> <span class='op'>:</span> <span class='lparen'>(</span><span class='id identifier rubyid_text'>text</span> <span class='op'>==</span> <span class='id identifier rubyid_v'>v</span><span class='rparen'>)</span>
|
2004
|
+
<span class='kw'>end</span><span class='op'>&.</span><span class='id identifier rubyid_first'>first</span>
|
2005
|
+
<span class='id identifier rubyid_key'>key</span>
|
2006
|
+
<span class='kw'>end</span></pre>
|
2007
|
+
</td>
|
2008
|
+
</tr>
|
2009
|
+
</table>
|
2010
|
+
</div>
|
2011
|
+
|
2012
|
+
</div>
|
2013
|
+
|
2014
|
+
</div>
|
2015
|
+
|
2016
|
+
<div id="footer">
|
2017
|
+
Generated on Tue Feb 26 16:50:03 2019 by
|
2018
|
+
<a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
|
2019
|
+
0.9.18 (ruby-2.5.3).
|
2020
|
+
</div>
|
2021
|
+
|
2022
|
+
</div>
|
2023
|
+
</body>
|
2024
|
+
</html>
|