dh_easy-login 0.0.5

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,514 @@
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>
7
+ File: README
8
+
9
+ &mdash; Documentation by YARD 0.9.20
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ pathId = "README";
19
+ relpath = '';
20
+ </script>
21
+
22
+
23
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
24
+
25
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
26
+
27
+
28
+ </head>
29
+ <body>
30
+ <div class="nav_wrap">
31
+ <iframe id="nav" src="file_list.html?1"></iframe>
32
+ <div id="resizer"></div>
33
+ </div>
34
+
35
+ <div id="main" tabindex="-1">
36
+ <div id="header">
37
+ <div id="menu">
38
+
39
+ <a href="_index.html">Index</a> &raquo;
40
+ <span class="title">File: README</span>
41
+
42
+ </div>
43
+
44
+ <div id="search">
45
+
46
+ <a class="full_list_link" id="class_list_link"
47
+ href="class_list.html">
48
+
49
+ <svg width="24" height="24">
50
+ <rect x="0" y="4" width="24" height="4" rx="1" ry="1"></rect>
51
+ <rect x="0" y="12" width="24" height="4" rx="1" ry="1"></rect>
52
+ <rect x="0" y="20" width="24" height="4" rx="1" ry="1"></rect>
53
+ </svg>
54
+ </a>
55
+
56
+ </div>
57
+ <div class="clear"></div>
58
+ </div>
59
+
60
+ <div id="content"><div id='filecontents'>
61
+ <p><a href="http://rubydoc.org/gems/dh_easy-login/frames"><img
62
+ src="http://img.shields.io/badge/docs-rdoc.info-blue.svg"></a> <a
63
+ href="http://github.com/DataHenOfficial/dh_easy-login/releases"><img
64
+ src="https://badge.fury.io/rb/dh_easy-login.svg"></a> <a
65
+ href="#license"><img
66
+ src="http://img.shields.io/badge/license-MIT-yellowgreen.svg"></a></p>
67
+
68
+ <h1 id="label-DhEasy+login+module">DhEasy login module</h1>
69
+
70
+ <h2 id="label-Description">Description</h2>
71
+
72
+ <p>DhEasy login is part of DhEasy gem collection. It provides an easy way to
73
+ handle login and session recovery, quite useful when scraping websites with
74
+ login features and expiring sessions.</p>
75
+
76
+ <p>Install gem: <code>ruby gem install &#39;dh_easy-login&#39; </code></p>
77
+
78
+ <p>Require gem: <code>ruby require &#39;dh_easy/login&#39; </code></p>
79
+
80
+ <p>Code documentation can be found <a
81
+ href="http://rubydoc.org/gems/dh_easy-login/frames">here</a>.</p>
82
+
83
+ <h2 id="label-How+to+implement">How to implement</h2>
84
+
85
+ <h3 id="label-Before+you+start">Before you start</h3>
86
+
87
+ <p>It is true that most user cases for <code>dh_easy-login</code> gem applies
88
+ to websites with login pages and create sessions, so we will cover this
89
+ scenario on our example.</p>
90
+
91
+ <p>Therefore, <code>dh_easy-login</code> gem is designed to handle
92
+ <strong>ANY</strong> kind of session recovery, even those that doesn&#39;t
93
+ requires a login form <code>POST</code> by just changing the flow from:</p>
94
+
95
+ <pre class="code ruby"><code class="ruby">login -&gt; login_post -&gt; restore
96
+ </code></pre>
97
+
98
+ <p>To whatever you need like for example:</p>
99
+
100
+ <pre class="code ruby"><code class="ruby">home -&gt; search_page -&gt; restore
101
+ </code></pre>
102
+
103
+ <p>Here are some user case examples that can be fixed by
104
+ <code>dh_easy-login</code> gem:</p>
105
+ <ul><li>
106
+ <p>Websites that invalidate requests with fast expiring cookies created on
107
+ first request.</p>
108
+ </li><li>
109
+ <p>Websites that generates tokens on every search (either on cookies or
110
+ query_params) that are required to fetch a detail page.</p>
111
+ </li><li>
112
+ <p>Websites that expires session due inactivity.</p>
113
+ </li><li>
114
+ <p>Websites that uses complex login flows.</p>
115
+ </li><li>
116
+ <p>etc.</p>
117
+ </li></ul>
118
+
119
+ <p>Feel confident to expirement with it until it fit all your needs.</p>
120
+
121
+ <h3 id="label-Adding+dh_easy-login+to+your+project">Adding dh_easy-login to your project</h3>
122
+
123
+ <p>Let&#39;s assume a simple project implementing <code>dh_easy</code> like
124
+ the one described on <a
125
+ href="https://github.com/DataHenOfficial/dh_easy/blob/master/README.md">dh_easy
126
+ README.md</a> that scrapers your website.</p>
127
+
128
+ <p>Now lets assume your website has a login page
129
+ <code>https://example.com/login</code> with a session that expires before
130
+ our sample project scrape job finish, causing all remaining webpages to
131
+ respond <code>403</code> HTTP response code and fail… quite the problem
132
+ isn&#39;t it? Well, not anymore, <code>dh_easy-login</code> gem to the
133
+ rescue!</p>
134
+
135
+ <p>First, let&#39;s create our base module that will contain our session
136
+ validation and recovery logic, for this example, we will call it
137
+ <code>LoginEnable</code> :</p>
138
+
139
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./lib/login_enable.rb
140
+ </span>
141
+ <span class='kw'>module</span> <span class='const'>LoginEnable</span>
142
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'><span class='object_link'><a href="DhEasy/Login.html" title="DhEasy::Login (module)">Login</a></span></span><span class='op'>::</span><span class='const'><span class='object_link'><a href="DhEasy/Login/Plugin.html" title="DhEasy::Login::Plugin (module)">Plugin</a></span></span><span class='op'>::</span><span class='const'><span class='object_link'><a href="DhEasy/Login/Plugin/EnabledBehavior.html" title="DhEasy::Login::Plugin::EnabledBehavior (module)">EnabledBehavior</a></span></span>
143
+
144
+ <span class='comment'># Hook to initialize login_flow configuration.
145
+ </span> <span class='kw'>def</span> <span class='id identifier rubyid_initialize_hook_login_plugin_enabled_behavior'>initialize_hook_login_plugin_enabled_behavior</span> <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='rbrace'>}</span>
146
+ <span class='id identifier rubyid_opts'>opts</span> <span class='op'>=</span> <span class='lbrace'>{</span><span class='label'>app_config:</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Config</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_opts'>opts</span><span class='rparen'>)</span><span class='rbrace'>}</span><span class='period'>.</span><span class='id identifier rubyid_merge'>merge</span> <span class='id identifier rubyid_opts'>opts</span>
147
+ <span class='ivar'>@login_flow</span> <span class='op'>=</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'><span class='object_link'><a href="DhEasy/Login.html" title="DhEasy::Login (module)">Login</a></span></span><span class='op'>::</span><span class='const'><span class='object_link'><a href="DhEasy/Login/Flow.html" title="DhEasy::Login::Flow (class)">Flow</a></span></span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span> <span class='id identifier rubyid_opts'>opts</span>
148
+ <span class='ivar'>@cookie</span> <span class='op'>=</span> <span class='kw'>nil</span>
149
+ <span class='kw'>end</span>
150
+
151
+ <span class='comment'># Get cookie after applying response cookie.
152
+ </span> <span class='comment'># @return [String] Cookie string.
153
+ </span> <span class='kw'>def</span> <span class='id identifier rubyid_cookie'>cookie</span>
154
+ <span class='kw'>return</span> <span class='ivar'>@cookie</span> <span class='kw'>if</span> <span class='ivar'>@cookie</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
155
+
156
+ <span class='id identifier rubyid_raw_cookie'>raw_cookie</span> <span class='op'>=</span> <span class='id identifier rubyid_page'>page</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>response_cookie</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span> <span class='op'>||</span> <span class='id identifier rubyid_page'>page</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>response_headers</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>Set-Cookie</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span>
157
+ <span class='ivar'>@cookie</span> <span class='op'>=</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Helper</span><span class='op'>::</span><span class='const'>Cookie</span><span class='period'>.</span><span class='id identifier rubyid_update'>update</span><span class='lparen'>(</span><span class='id identifier rubyid_page'>page</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>headers</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>Cookie</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span><span class='comma'>,</span> <span class='id identifier rubyid_raw_cookie'>raw_cookie</span><span class='rparen'>)</span>
158
+ <span class='ivar'>@cookie</span>
159
+ <span class='kw'>end</span>
160
+
161
+ <span class='comment'># Validates session.
162
+ </span> <span class='comment'># @return [Boolean] `true` when session is valid, else `false`.
163
+ </span> <span class='kw'>def</span> <span class='id identifier rubyid_valid_session?'>valid_session?</span>
164
+ <span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>200</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>404</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_include?'>include?</span> <span class='id identifier rubyid_page'>page</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>response_status_code</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span><span class='period'>.</span><span class='id identifier rubyid_strip'>strip</span>
165
+ <span class='kw'>end</span>
166
+
167
+ <span class='comment'># Fix page session when session is invalid.
168
+ </span> <span class='comment'># @return [Boolean] `true` when session is valid, else `false`.
169
+ </span> <span class='kw'>def</span> <span class='id identifier rubyid_fix_session'>fix_session</span>
170
+ <span class='kw'>return</span> <span class='kw'>true</span> <span class='kw'>if</span> <span class='id identifier rubyid_valid_session?'>valid_session?</span>
171
+
172
+ <span class='id identifier rubyid_login_flow'>login_flow</span><span class='period'>.</span><span class='id identifier rubyid_fix_session'>fix_session</span> <span class='kw'>do</span>
173
+ <span class='id identifier rubyid_save_pages'>save_pages</span> <span class='lbracket'>[</span><span class='lbrace'>{</span>
174
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>url</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>https://example.com/login</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
175
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>page_type</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>login</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
176
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>priority</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='int'>9</span><span class='comma'>,</span>
177
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>freshness</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='const'>Time</span><span class='period'>.</span><span class='id identifier rubyid_now'>now</span><span class='period'>.</span><span class='id identifier rubyid_iso8601'>iso8601</span><span class='comma'>,</span>
178
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>cookie</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&quot;</span><span class='tstring_content'>stl=</span><span class='embexpr_beg'>#{</span><span class='id identifier rubyid_salt'>salt</span><span class='embexpr_end'>}</span><span class='tstring_end'>&quot;</span></span><span class='comma'>,</span>
179
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>headers</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='lbrace'>{</span>
180
+ <span class='comment'># Add any extra header you need here
181
+ </span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>Cookie</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&quot;</span><span class='tstring_content'>stl=</span><span class='embexpr_beg'>#{</span><span class='id identifier rubyid_salt'>salt</span><span class='embexpr_end'>}</span><span class='tstring_end'>&quot;</span></span>
182
+ <span class='rbrace'>}</span>
183
+ <span class='rbrace'>}</span><span class='rbracket'>]</span>
184
+ <span class='kw'>end</span>
185
+
186
+ <span class='kw'>false</span>
187
+ <span class='kw'>end</span>
188
+ <span class='kw'>end</span>
189
+ </code></pre>
190
+
191
+ <p>Notice that our example <code>valid_session</code> method uses
192
+ <code>200</code> and <code>404</code> HTTP response codes to validate that
193
+ our session hasn&#39;t expired yet, therefore, <strong><em>this might not
194
+ be the case for your website</em></strong>, so make sure to modify this
195
+ method to fit your needs.</p>
196
+
197
+ <p>Our <code>fix_session</code> method will store any page with a failed
198
+ session by creating an output so it can be restored later once we have the
199
+ new active session cookie.</p>
200
+
201
+ <p><code>fix_session</code> method will also mark the current session cookie
202
+ as expired and <strong><em>enqueue a new <code>login</code> page with HIGH
203
+ priority as long as another parser hasn&#39;t already did it to avoid
204
+ duplicates</em></strong>.</p>
205
+
206
+ <p><code>cookie</code> method will merge the request cookies with the response
207
+ cookies, so we can be sure that the cookies are always updated when needed.</p>
208
+
209
+ <p>Next step is to create a simple parser that enqueue the <code>POST</code>
210
+ of our login page:</p>
211
+
212
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./parsers/login.rb
213
+ </span>
214
+ <span class='kw'>module</span> <span class='const'>Parsers</span>
215
+ <span class='kw'>class</span> <span class='const'>Login</span>
216
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Plugin</span><span class='op'>::</span><span class='const'>Parser</span>
217
+ <span class='id identifier rubyid_include'>include</span> <span class='const'>LoginEnable</span>
218
+
219
+ <span class='kw'>def</span> <span class='id identifier rubyid_parse'>parse</span>
220
+ <span class='id identifier rubyid_pages'>pages</span> <span class='op'>&lt;&lt;</span> <span class='lbrace'>{</span>
221
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>url</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>http://example.com/login</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
222
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>page_type</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>login_post</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
223
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>priority</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='int'>10</span><span class='comma'>,</span>
224
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>method</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>POST</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
225
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>cookie</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='id identifier rubyid_cookie'>cookie</span><span class='comma'>,</span>
226
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>headers</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='lbrace'>{</span>
227
+ <span class='comment'># Add any extra header you need here
228
+ </span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>Cookie</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='id identifier rubyid_cookie'>cookie</span>
229
+ <span class='rbrace'>}</span>
230
+ <span class='rbrace'>}</span>
231
+ <span class='kw'>end</span>
232
+ <span class='kw'>end</span>
233
+ <span class='kw'>end</span>
234
+ </code></pre>
235
+
236
+ <p>Now let&#39;s handle the login response, seed and restore any page with an
237
+ expired session:</p>
238
+
239
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./parsers/login_post.rb
240
+ </span>
241
+ <span class='kw'>module</span> <span class='const'>Parsers</span>
242
+ <span class='kw'>class</span> <span class='const'>LoginPost</span>
243
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Plugin</span><span class='op'>::</span><span class='const'>Parser</span>
244
+ <span class='id identifier rubyid_include'>include</span> <span class='const'>LoginEnable</span>
245
+
246
+ <span class='kw'>def</span> <span class='id identifier rubyid_seed!'>seed!</span>
247
+ <span class='kw'>return</span> <span class='kw'>if</span> <span class='id identifier rubyid_login_flow'>login_flow</span><span class='period'>.</span><span class='id identifier rubyid_seeded?'>seeded?</span>
248
+
249
+ <span class='const'>Seeders</span><span class='op'>::</span><span class='const'>Seeder</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='label'>context:</span> <span class='id identifier rubyid_context'>context</span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_seed'>seed</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_new_page'>new_page</span><span class='op'>|</span>
250
+ <span class='id identifier rubyid_login_flow'>login_flow</span><span class='period'>.</span><span class='id identifier rubyid_fix_page!'>fix_page!</span> <span class='id identifier rubyid_new_page'>new_page</span>
251
+ <span class='kw'>end</span>
252
+
253
+ <span class='id identifier rubyid_login_flow'>login_flow</span><span class='period'>.</span><span class='id identifier rubyid_seeded!'>seeded!</span>
254
+ <span class='kw'>end</span>
255
+
256
+ <span class='kw'>def</span> <span class='id identifier rubyid_parse'>parse</span>
257
+ <span class='id identifier rubyid_login_flow'>login_flow</span><span class='period'>.</span><span class='id identifier rubyid_update_config'>update_config</span><span class='lparen'>(</span>
258
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>cookie</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='id identifier rubyid_get_cookie'>get_cookie</span><span class='comma'>,</span>
259
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>expired</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='kw'>false</span>
260
+ <span class='rparen'>)</span>
261
+
262
+ <span class='comment'># Wait for any pending fetch to be hold
263
+ </span> <span class='id identifier rubyid_sleep'>sleep</span> <span class='int'>10</span>
264
+
265
+ <span class='id identifier rubyid_login_flow'>login_flow</span><span class='period'>.</span><span class='id identifier rubyid_restore_held_pages'>restore_held_pages</span>
266
+ <span class='id identifier rubyid_seed!'>seed!</span>
267
+ <span class='kw'>end</span>
268
+ <span class='kw'>end</span>
269
+ <span class='kw'>end</span>
270
+ </code></pre>
271
+
272
+ <p>Notice something interesting? that&#39;s right, the seeding happens
273
+ <strong>AFTER</strong> we got our new active session cookie, so the pages
274
+ we seed includes the session cookie. We use
275
+ <code>login_flow.fix_page!</code> method to add our latest active session
276
+ cookie along some internal <code>page['vars']</code> (used to handle page
277
+ recovery) to our seeded pages.</p>
278
+
279
+ <p><strong>IMPORTANT:</strong> This example assumes that
280
+ <code>login_post</code> pages will never fails, but you might need to add
281
+ some extra validations to make sure the login attempt was successful before
282
+ restoring your pages.</p>
283
+
284
+ <p><strong><em>Note:</em></strong> This example assumes that all pages to be
285
+ seeded requires an active session, so we will add it to all pages we seed,
286
+ but this will likely not apply to all pages to be seeded in a real life
287
+ scenario, so make sure to add it only to those pages that requires an
288
+ active session.</p>
289
+
290
+ <p>So next step is to modify our seeder so it allow the cookie inclusion by
291
+ adding a <code>block</code> param that will be used by our
292
+ <code>Parsers::LoginPost#seed!</code> method:</p>
293
+
294
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./seeder/seeder.rb
295
+ </span>
296
+ <span class='kw'>module</span> <span class='const'>Seeder</span>
297
+ <span class='kw'>class</span> <span class='const'>Seeder</span>
298
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Plugin</span><span class='op'>::</span><span class='const'>Seeder</span>
299
+
300
+ <span class='kw'>def</span> <span class='id identifier rubyid_seed'>seed</span> <span class='op'>&amp;</span><span class='id identifier rubyid_block'>block</span>
301
+ <span class='id identifier rubyid_new_page'>new_page</span> <span class='op'>=</span> <span class='lbrace'>{</span>
302
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>url</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>https://example.com/login.rb?query=food</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
303
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>page_type</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>search</span><span class='tstring_end'>&#39;</span></span>
304
+ <span class='rbrace'>}</span>
305
+ <span class='id identifier rubyid_block'>block</span><span class='period'>.</span><span class='id identifier rubyid_call'>call</span><span class='lparen'>(</span><span class='id identifier rubyid_page'>page</span><span class='rparen'>)</span> <span class='kw'>unless</span> <span class='id identifier rubyid_block'>block</span><span class='period'>.</span><span class='id identifier rubyid_nil?'>nil?</span>
306
+ <span class='id identifier rubyid_pages'>pages</span> <span class='op'>&lt;&lt;</span> <span class='id identifier rubyid_new_page'>new_page</span>
307
+ <span class='kw'>end</span>
308
+ <span class='kw'>end</span>
309
+ <span class='kw'>end</span>
310
+ </code></pre>
311
+
312
+ <p>Now we will need to create a new seeder to seed login page:</p>
313
+
314
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./seeder/login.rb
315
+ </span>
316
+ <span class='kw'>module</span> <span class='const'>Seeder</span>
317
+ <span class='kw'>class</span> <span class='const'>Login</span>
318
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Plugin</span><span class='op'>::</span><span class='const'>Seeder</span>
319
+
320
+ <span class='kw'>def</span> <span class='id identifier rubyid_seed'>seed</span>
321
+ <span class='id identifier rubyid_pages'>pages</span> <span class='op'>&lt;&lt;</span> <span class='lbrace'>{</span>
322
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>url</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>https://example.com/login</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
323
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>page_type</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>login</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
324
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>priority</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='int'>9</span>
325
+ <span class='rbrace'>}</span>
326
+ <span class='kw'>end</span>
327
+ <span class='kw'>end</span>
328
+ <span class='kw'>end</span>
329
+ </code></pre>
330
+
331
+ <p>Now let&#39;s modify our <code>./config.yaml</code> to add our new page
332
+ types on it, as well as let us parse failed fetched pages since our example
333
+ assumes that website will return <code>403</code> HTTP response code when
334
+ session has expired:</p>
335
+
336
+ <pre class="code ruby"><code class="ruby"># ./config.yaml
337
+
338
+ parse_failed_pages: true
339
+
340
+ seeder:
341
+ file: ./router/seeder.rb
342
+ disabled: false
343
+
344
+ parsers:
345
+ - page_type: search
346
+ file: ./router/parser.rb
347
+ disabled: false
348
+ - page_type: product
349
+ file: ./router/parser.rb
350
+ disabled: false
351
+ - page_type: login
352
+ file: ./router/parser.rb
353
+ disabled: false
354
+ - page_type: login_post
355
+ file: ./router/parser.rb
356
+ disabled: false
357
+ </code></pre>
358
+
359
+ <p>And don&#39;t forget to modify <code>./dh_easy.yaml</code> to add our new
360
+ routes and change our seeder so login page can be seed first instead of our
361
+ old seeder:</p>
362
+
363
+ <pre class="code ruby"><code class="ruby"># ./dh_easy.yaml
364
+
365
+ router:
366
+ parser:
367
+ routes:
368
+ - page_type: search
369
+ class: Parsers::Search
370
+ - page_type: product
371
+ class: Parsers::Product
372
+ - page_type: login
373
+ class: Parsers::Login
374
+ - page_type: login_post
375
+ class: Parsers::LoginPost
376
+
377
+ seeder:
378
+ routes:
379
+ - class: Seeder::Login
380
+ </code></pre>
381
+
382
+ <p>Now, let&#39;s will need to modify our routers as well since we modified
383
+ our <code>dh_easy.yaml</code> routes and added new classes:</p>
384
+
385
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./router/seeder.rb
386
+ </span>
387
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>dh_easy/router</span><span class='tstring_end'>&#39;</span></span>
388
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>./seeder/login</span><span class='tstring_end'>&#39;</span></span>
389
+
390
+ <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Router</span><span class='op'>::</span><span class='const'>Seeder</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='period'>.</span><span class='id identifier rubyid_route'>route</span> <span class='label'>context:</span> <span class='kw'>self</span>
391
+ </code></pre>
392
+
393
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./router/parser.rb
394
+ </span>
395
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>cgi</span><span class='tstring_end'>&#39;</span></span>
396
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>dh_easy/router</span><span class='tstring_end'>&#39;</span></span>
397
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>dh_easy/login</span><span class='tstring_end'>&#39;</span></span>
398
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>./lib/login_enable</span><span class='tstring_end'>&#39;</span></span>
399
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>./seeder/seeder</span><span class='tstring_end'>&#39;</span></span>
400
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>./parsers/search</span><span class='tstring_end'>&#39;</span></span>
401
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>./parsers/product</span><span class='tstring_end'>&#39;</span></span>
402
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>./parsers/login</span><span class='tstring_end'>&#39;</span></span>
403
+ <span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>./parsers/login_post</span><span class='tstring_end'>&#39;</span></span>
404
+
405
+ <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Router</span><span class='op'>::</span><span class='const'>Parser</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='period'>.</span><span class='id identifier rubyid_route'>route</span> <span class='label'>context:</span> <span class='kw'>self</span>
406
+ </code></pre>
407
+
408
+ <p>Next, we need to include our <code>LoginEnable</code> module on every
409
+ parser that requires session validation to fix any expired session request.
410
+ To do this, we will be using our <code>LoginEnable#fix_session</code>
411
+ function as the first thing to do on each parser&#39;s <code>parse</code>
412
+ method:</p>
413
+
414
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./parsers/search.rb
415
+ </span>
416
+ <span class='kw'>module</span> <span class='const'>Parsers</span>
417
+ <span class='kw'>class</span> <span class='const'>Search</span>
418
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Plugin</span><span class='op'>::</span><span class='const'>Parser</span>
419
+ <span class='id identifier rubyid_include'>include</span> <span class='const'>LoginEnable</span>
420
+
421
+ <span class='kw'>def</span> <span class='id identifier rubyid_parse'>parse</span>
422
+ <span class='kw'>return</span> <span class='kw'>unless</span> <span class='id identifier rubyid_fix_session'>fix_session</span>
423
+
424
+ <span class='id identifier rubyid_html'>html</span> <span class='op'>=</span> <span class='const'>Nokogiri</span><span class='period'>.</span><span class='const'>HTML</span> <span class='id identifier rubyid_content'>content</span>
425
+ <span class='id identifier rubyid_html'>html</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>.name</span><span class='tstring_end'>&#39;</span></span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_element'>element</span><span class='op'>|</span>
426
+ <span class='id identifier rubyid_name'>name</span> <span class='op'>=</span> <span class='id identifier rubyid_element'>element</span><span class='period'>.</span><span class='id identifier rubyid_text'>text</span><span class='period'>.</span><span class='id identifier rubyid_strip'>strip</span>
427
+ <span class='id identifier rubyid_pages'>pages</span> <span class='op'>&lt;&lt;</span> <span class='lbrace'>{</span>
428
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>url</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&quot;</span><span class='tstring_content'>https://example.com/product/</span><span class='embexpr_beg'>#{</span><span class='const'>CGI</span><span class='op'>::</span><span class='id identifier rubyid_escape'>escape</span> <span class='id identifier rubyid_name'>name</span><span class='embexpr_end'>}</span><span class='tstring_end'>&quot;</span></span><span class='comma'>,</span>
429
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>page_type</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>product</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
430
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>vars</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='lbrace'>{</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>name</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='id identifier rubyid_name'>name</span><span class='rbrace'>}</span>
431
+ <span class='rbrace'>}</span>
432
+ <span class='kw'>end</span>
433
+ <span class='kw'>end</span>
434
+ <span class='kw'>end</span>
435
+ <span class='kw'>end</span>
436
+ </code></pre>
437
+
438
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./parsers/product.rb
439
+ </span>
440
+ <span class='kw'>module</span> <span class='const'>Parsers</span>
441
+ <span class='kw'>class</span> <span class='const'>Product</span>
442
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Plugin</span><span class='op'>::</span><span class='const'>Parser</span>
443
+ <span class='id identifier rubyid_include'>include</span> <span class='const'>LoginEnable</span>
444
+
445
+ <span class='kw'>def</span> <span class='id identifier rubyid_parse'>parse</span>
446
+ <span class='kw'>return</span> <span class='kw'>unless</span> <span class='id identifier rubyid_fix_session'>fix_session</span>
447
+
448
+ <span class='id identifier rubyid_html'>html</span> <span class='op'>=</span> <span class='const'>Nokogiri</span><span class='period'>.</span><span class='const'>HTML</span> <span class='id identifier rubyid_content'>content</span>
449
+ <span class='id identifier rubyid_description'>description</span> <span class='op'>=</span> <span class='id identifier rubyid_html'>html</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>.description</span><span class='tstring_end'>&#39;</span></span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_first'>first</span><span class='period'>.</span><span class='id identifier rubyid_text'>text</span><span class='period'>.</span><span class='id identifier rubyid_strip'>strip</span>
450
+ <span class='id identifier rubyid_outputs'>outputs</span> <span class='op'>&lt;&lt;</span> <span class='lbrace'>{</span>
451
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>_collection</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>product</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
452
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>name</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='id identifier rubyid_page'>page</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>vars</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span><span class='lbracket'>[</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>name</span><span class='tstring_end'>&#39;</span></span><span class='rbracket'>]</span><span class='comma'>,</span>
453
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>description</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='id identifier rubyid_description'>description</span>
454
+ <span class='rbrace'>}</span>
455
+ <span class='kw'>end</span>
456
+ <span class='kw'>end</span>
457
+ <span class='kw'>end</span>
458
+ </code></pre>
459
+
460
+ <p><strong><em>Note:</em></strong> This example asumes that all pages requires
461
+ an active session, so we will add it to all parsers, but this will likely
462
+ not apply to all parsers in a real life scenario since not all web pages
463
+ will require session, so make sure to add it to only the parsers that needs
464
+ it.</p>
465
+
466
+ <p>Finally, we need to make sure that every page that requires an active
467
+ session is enqueued within our latest active session cookie, so we need to
468
+ use <code>login_flow.fix_page!</code> method on all pages to be enqueued
469
+ that applies.</p>
470
+
471
+ <p>As for this example, we already add it to our search pages enqueued by our
472
+ seeder, so the only place left to modify is
473
+ <code>./parsers/search.rb</code> parser since it enqueues
474
+ <code>product</code> pages:</p>
475
+
476
+ <pre class="code ruby"><code class="ruby"><span class='comment'># ./parsers/search.rb
477
+ </span>
478
+ <span class='kw'>module</span> <span class='const'>Parsers</span>
479
+ <span class='kw'>class</span> <span class='const'>Search</span>
480
+ <span class='id identifier rubyid_include'>include</span> <span class='const'><span class='object_link'><a href="DhEasy.html" title="DhEasy (module)">DhEasy</a></span></span><span class='op'>::</span><span class='const'>Core</span><span class='op'>::</span><span class='const'>Plugin</span><span class='op'>::</span><span class='const'>Parser</span>
481
+ <span class='id identifier rubyid_include'>include</span> <span class='const'>LoginEnable</span>
482
+
483
+ <span class='kw'>def</span> <span class='id identifier rubyid_parse'>parse</span>
484
+ <span class='kw'>return</span> <span class='kw'>unless</span> <span class='id identifier rubyid_fix_session'>fix_session</span>
485
+
486
+ <span class='id identifier rubyid_html'>html</span> <span class='op'>=</span> <span class='const'>Nokogiri</span><span class='period'>.</span><span class='const'>HTML</span> <span class='id identifier rubyid_content'>content</span>
487
+ <span class='id identifier rubyid_html'>html</span><span class='period'>.</span><span class='id identifier rubyid_css'>css</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>.name</span><span class='tstring_end'>&#39;</span></span><span class='rparen'>)</span><span class='period'>.</span><span class='id identifier rubyid_each'>each</span> <span class='kw'>do</span> <span class='op'>|</span><span class='id identifier rubyid_element'>element</span><span class='op'>|</span>
488
+ <span class='id identifier rubyid_name'>name</span> <span class='op'>=</span> <span class='id identifier rubyid_element'>element</span><span class='period'>.</span><span class='id identifier rubyid_text'>text</span><span class='period'>.</span><span class='id identifier rubyid_strip'>strip</span>
489
+ <span class='id identifier rubyid_new_page'>new_page</span> <span class='op'>=</span> <span class='lbrace'>{</span>
490
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>url</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&quot;</span><span class='tstring_content'>https://example.com/product/</span><span class='embexpr_beg'>#{</span><span class='const'>CGI</span><span class='op'>::</span><span class='id identifier rubyid_escape'>escape</span> <span class='id identifier rubyid_name'>name</span><span class='embexpr_end'>}</span><span class='tstring_end'>&quot;</span></span><span class='comma'>,</span>
491
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>page_type</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>product</span><span class='tstring_end'>&#39;</span></span><span class='comma'>,</span>
492
+ <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>vars</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='lbrace'>{</span><span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>name</span><span class='tstring_end'>&#39;</span></span> <span class='op'>=&gt;</span> <span class='id identifier rubyid_name'>name</span><span class='rbrace'>}</span>
493
+ <span class='rbrace'>}</span>
494
+ <span class='id identifier rubyid_login_flow'>login_flow</span><span class='period'>.</span><span class='id identifier rubyid_fix_page!'>fix_page!</span> <span class='id identifier rubyid_new_page'>new_page</span>
495
+ <span class='id identifier rubyid_pages'>pages</span> <span class='op'>&lt;&lt;</span> <span class='id identifier rubyid_new_page'>new_page</span>
496
+ <span class='kw'>end</span>
497
+ <span class='kw'>end</span>
498
+ <span class='kw'>end</span>
499
+ <span class='kw'>end</span>
500
+ </code></pre>
501
+
502
+ <p>Hurray! Now you have implemented a fully functional login flow with auto
503
+ recovery capabilities on your project.</p>
504
+ </div></div>
505
+
506
+ <div id="footer">
507
+ Generated on Wed Dec 4 23:19:59 2019 by
508
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
509
+ 0.9.20 (ruby-2.5.3).
510
+ </div>
511
+
512
+ </div>
513
+ </body>
514
+ </html>