super_crawler 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 4f566869c9f06df215b047291bbc08a83e25a176
4
- data.tar.gz: ee35c2dcee2dae0289e70a8fb0195c63859bf793
3
+ metadata.gz: ffb65575970b9bd45f3ac9fa80b004c96df3492a
4
+ data.tar.gz: 1a72997367a389dcfb67dcd913554207604ef3e2
5
5
  SHA512:
6
- metadata.gz: e7f5c9db1479c83af91879d41f9ea6480142ddf11fba9fcc3bc944f46de90d8b821929582526f245cd6752a4506019f8b2489d08175e5c6fe30faea620c1ea61
7
- data.tar.gz: 1c81fc244e6679cfe6782b9651782944306290a9281277658a220155ef6c4a7ccee59b9b7a2f058c0862d52027e7f8453ba01a21c953d2f31ff5980e9fea87ad
6
+ metadata.gz: 67dd33ed9ee8965a84cdcc22187212ea97c47c6a29bc868818b31246c827e342c486d0b12270d52eac1c1c8071cab51fef4dd441470abd1702905e84c814c31c
7
+ data.tar.gz: bd45255d527837c177b788f81412a82e0609a015a65e9d9423841e80bef362b62f854c55e1e1637cf8fc45bbf19d256bac4a66da9e804b8850128e0ca6f8d9a8
data/.gitignore CHANGED
@@ -8,4 +8,5 @@
8
8
  /spec/reports/
9
9
  /tmp/
10
10
 
11
+ /gems/
11
12
  *.gem
data/README.md CHANGED
@@ -6,23 +6,16 @@ Easy (yet efficient) ruby gem to crawl your favorite website.
6
6
 
7
7
  Open your terminal, then:
8
8
 
9
- ```bash
10
- $ git clone https://github.com/htaidirt/super_crawler
9
+ git clone https://github.com/htaidirt/super_crawler
10
+ cd super_crawler
11
+ bundle
12
+ ./bin/console
11
13
 
12
- $ cd super_crawler
14
+ Then
13
15
 
14
- $ bundle
15
-
16
- $ ./bin/console
17
- ```
18
-
19
- ```ruby
20
- > sc = SuperCrawler::CrawlSite.new('https://gocardless.com')
21
-
22
- > sc.start # => Start crawling the website
23
-
24
- > sc.render(5) # => Show first 5 results of the crawling as sitemap
25
- ```
16
+ sc = SuperCrawler::Crawl.new('https://gocardless.com')
17
+ sc.start(10) # => Start crawling the website using 10 threads
18
+ sc.render(5) # => Show the first 5 results of the crawling as sitemap
26
19
 
27
20
  ## Installation
28
21
 
@@ -52,64 +45,57 @@ This gem is an experiment and can't be used for production purposes. Please, use
52
45
 
53
46
  There are also a lot of limitations that weren't handled due to time. You'll find more information on the limitations below.
54
47
 
55
- SuperCrawler gem was only tested on MRI and ruby 2.3.1.
48
+ SuperCrawler gem was only tested on MRI 2.3.1 and Rubinius 2.5.8.
56
49
 
57
50
  ## Philosophy
58
51
 
59
- Starting from a URL, extract all the internal links and assets within the page. Add all unique links to an array for future exploration of theses links. Repeat for each link in the links list until no new link is discovered.
52
+ Starting from a given URL, the crawler extracts all the internal links and assets within the page. The links are added to a list of unique links for further exploration. The crawler repeats the exploration visiting all the links until no new link is found.
60
53
 
61
- Due to the heavy operations, and the time to access each page content, we will use threads to perform near-parallel processing.
54
+ Due to the heavy operations (thousands of pages), and the network time to access each page content, we will use threads to perform near-parallel processing.
62
55
 
63
- In order to keep the code readable and structured, create two classes:
56
+ In order to keep the code readable and structured, we created two classes:
64
57
 
65
- - `SuperCrawler::CrawlPage` that is responsible for crawling a single page and extracting all relevant information (internal links and assets)
66
- - `SuperCrawler::CrawlSite` that is responsible for crawling a whole website, by collecting links and calling `SuperCrawler::CrawlPage` within threads. This class is also responsible for rendering results.
58
+ - `SuperCrawler::Scrap` is responsible for scrapping a single page and extracting all relevant information (internal links and assets)
59
+ - `SuperCrawler::Crawl` is responsible for crawling a whole website by collecting and managing links (using `SuperCrawler::Scrap` on every internal link found.) This class is also responsible for rendering results.
67
60
 
68
61
  ## More detailed use
69
62
 
70
63
  Open your favorite ruby console and require the gem:
71
64
 
72
- ```ruby
73
- require 'super_crawler'
74
- ```
65
+ require 'super_crawler'
75
66
 
76
- ### Crawling a single web page
67
+ ### Scrapping a single web page
77
68
 
78
69
  Read the following if you would like to crawl a single web page and extract relevant information (internal links and assets).
79
70
 
80
- ```ruby
81
- page = SuperCrawler::CrawlPage.new( url )
82
- ```
71
+ page = SuperCrawler::Scrap.new( url )
83
72
 
84
- Where `url` should be the URL of the page you would like to crawl.
73
+ Where `url` should be the URL of the page you would like to scrap.
85
74
 
86
- **Nota:** When missing a scheme (`http://` or `https://`), SuperCrawler will prepend the URL with an `http://`.
75
+ **Nota:** If the given URL has a missing scheme (`http://` or `https://`), SuperCrawler will prepend `http://` to the URL.
87
76
 
88
77
  #### Get the encoded URL
89
78
 
90
79
  Run
91
80
 
92
- ```ruby
93
- page.url
94
- ```
95
-
96
- to get the encoded URL provided.
81
+ page.url
82
+
83
+ to get the encoded URL.
97
84
 
98
85
  #### Get internal links of a page
99
86
 
100
87
  Run
101
88
 
102
- ```ruby
103
- page.get_links
104
- ```
105
-
106
- to get a list of internal links within the crawled page. An internal link is a link that _has the same host than the page URL_. Subdomains are rejected.
89
+ page.get_links
90
+
91
+ to get the list of internal links in the page. An internal link is a link that _has the same schame and host than the provided URL_. Subdomains are rejected.
107
92
 
108
93
  This method searches in the `href` attribute of all `<a>` anchor tags.
109
94
 
110
- **Nota:** This method returns an array of absolute URLs (all internal links).
95
+ **Nota:**
111
96
 
112
- **Nota 2:** Bad links and special links (like mailto and javascript) are discarded.
97
+ - This method returns an array of absolute URLs (all internal links).
98
+ - Bad links and special links (like mailto and javascript) are discarded.
113
99
 
114
100
  #### Get images of a page
115
101
 
@@ -129,92 +115,75 @@ to get a list of images links within the page. The images links are extracted fr
129
115
 
130
116
  Run
131
117
 
132
- ```ruby
133
- page.get_stylesheets
134
- ```
118
+ page.get_stylesheets
135
119
 
136
- to get a list of stylesheets links within the page. The stylesheets links are extracted from the `href="..."` attribute of all `<link rel="stylesheet">` tags.
120
+ to get a list of stylesheet links within the page. The links are extracted from the `href="..."` attribute of all `<link rel="stylesheet">` tags.
137
121
 
138
- **Nota:** Inline styling isn't yet detected by the method.
122
+ **Nota:**
139
123
 
140
- **Nota 2:** This method returns an array of absolute URLs.
124
+ - Inline styling isn't yet detected by the method.
125
+ - This method returns an array of absolute URLs.
141
126
 
142
127
  #### Get scripts of a page
143
128
 
144
129
  Run
145
130
 
146
- ```ruby
147
- page.get_scripts
148
- ```
131
+ page.get_scripts
149
132
 
150
- to get a list of scripts links within the page. The scripts links are extracted from the `src="..."` attribute of all `<script>` tags.
133
+ to get a list of script links within the page. The links are extracted from the `src="..."` attribute of all `<script>` tags.
151
134
 
152
- **Nota:** Inline script isn't yet detected by the method.
135
+ **Nota:**
153
136
 
154
- **Nota 2:** This method returns an array of absolute URLs.
137
+ - Inline script isn't yet detected by the method.
138
+ - This method returns an array of absolute URLs.
155
139
 
156
140
  #### List all assets of a page
157
141
 
158
142
  Run
159
143
 
160
- ```ruby
161
- page.get_assets
162
- ```
144
+ page.get_assets
163
145
 
164
- to get a list of all assets (images, stylesheets and scripts links) as a hash of arrays.
146
+ to get a list of all assets (links of images, stylesheets and scripts) as a hash of arrays.
165
147
 
166
148
  ### Crawling a whole web site
167
149
 
168
- First instantiate the site crawler.
169
-
170
- ```ruby
171
- sc = SuperCrawler::CrawlSite.new(url, count_threads)
172
- ```
150
+ sc = SuperCrawler::Crawl.new(url)
173
151
 
174
- where `url` is the URL of the page to crawl, and `count_threads` the number of threads to handle the job (by default 10).
152
+ where `url` is the URL of the website to crawl.
175
153
 
176
154
  Next, start the crawler:
177
155
 
178
- ```ruby
179
- sc.start
180
- ```
156
+ sc.start(number_of_threads)
157
+
158
+ where `number_of_threads` is the number of threads that will perform the job (10 by default.) **This can take some time, depending on the site to crawl.**
181
159
 
182
- This can take some time, depending on the site to crawl.
160
+ To access the crawl results, use the following:
183
161
 
184
- To access crawl results, you can use the following:
185
-
186
- ```ruby
187
- sc.links # The array of internal links
188
-
189
- sc.crawl_results # Array of hashes containing links and assets for every link crawled
190
- ```
162
+ sc.links # The array of unique internal links
163
+ sc.crawl_results # Array of hashes containing links and assets for every unique internal link found
191
164
 
192
165
  To see the crawling as a sitemap, use:
193
166
 
194
- ```ruby
195
- sc.render(5) # Will render the sitemap of the first 5 pages
196
- ```
167
+ sc.render(5) # Will render the sitemap of the first 5 pages
197
168
 
198
- TODO: Make more sophisticated rendering class, that can render within files of different formats (HTML, XML, JSON,...)
169
+ _TODO: Create a separate and more sophisticated rendering class, that can render within files of different formats (HTML, XML, JSON,...)_
199
170
 
200
171
  #### Tips on searching assets and links
201
172
 
202
173
  After `sc.start`, you can access all collected resources (links and assets) using `sc.crawl_results`. This has the following structure:
203
174
 
204
- ```json
205
- [
206
- {
207
- url: 'http://example.com/',
208
- links: [...array of internal links...],
209
- assets: {
210
- images: [...array of images links],
211
- stylesheets: [...array of stylesheets links],
212
- scripts: [...array of scripts links],
213
- }
214
- },
215
- ...
216
- ]
217
- ```
175
+ [
176
+ {
177
+ url: 'http://example.com/',
178
+ links: [...array of internal links...],
179
+ assets: {
180
+ images: [...array of images links],
181
+ stylesheets: [...array of stylesheets links],
182
+ scripts: [...array of scripts links],
183
+ }
184
+ },
185
+ ...
186
+ ]
218
187
 
219
188
  You can use `sc.crawl_results.select{ |resource| ... }` to select a particular resource.
220
189
 
@@ -223,12 +192,12 @@ You can use `sc.crawl_results.select{ |resource| ... }` to select a particular r
223
192
  Actually, the gem has the following limitations:
224
193
 
225
194
  - Subdomains are not considered as internal links
226
- - Both HTTP and HTTPS pages are taken into account. This can increase the number of links found, but we think that we need to keep it because some sites don't duplicate all contents for HTTP and HTTPS
195
+ - A link with the same domain but different scheme is ignored (http -> https, or the opposite)
227
196
  - Only links within `<a href="...">` tags are extracted
228
197
  - Only images links within `<img src="..."/>` tags are extracted
229
198
  - Only stylesheets links within `<link rel="stylesheet" href="..." />` tags are extracted
230
199
  - Only scripts links within `<script src="...">` tags are extracted
231
- - A page that is not accessible (eg. error 404) is not checked later
200
+ - A page that is not accessible (not status 200) is not checked later
232
201
 
233
202
  ## Development
234
203
 
@@ -238,11 +207,11 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
238
207
 
239
208
  ## Contributing
240
209
 
241
- Bug reports and pull requests are welcome on GitHub at https://github.com/htaidirt/super_crawler. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
210
+ Bug reports and pull requests are welcome on GitHub at [https://github.com/htaidirt/super_crawler](https://github.com/htaidirt/super_crawler). This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
242
211
 
243
- Want to contribute, please follow this process:
212
+ Please, follow this process:
244
213
 
245
- 1. Fork it
214
+ 1. Fork the project
246
215
  2. Create your feature branch (git checkout -b my-new-feature)
247
216
  3. Commit your changes (git commit -am 'Add some feature')
248
217
  4. Push to the branch (git push origin my-new-feature)
@@ -1,33 +1,32 @@
1
1
  require 'thread'
2
2
 
3
- require 'super_crawler/crawl_page'
3
+ require 'super_crawler/scrap'
4
4
 
5
5
  module SuperCrawler
6
6
 
7
7
  ###
8
8
  # Crawl a whole website
9
9
  #
10
- class CrawlSite
10
+ class Crawl
11
11
 
12
12
  attr_reader :links, :crawl_results
13
13
 
14
- def initialize start_url, threads = 10, options = {}
14
+ def initialize start_url, options = {}
15
15
  @start_url = URI(URI.encode start_url).normalize().to_s # Normalize the given URL
16
16
  @links = [@start_url] # Will contain the list of all links found
17
17
  @crawl_results = [] # Will contain the crawl results (links and assets), as array of hashes
18
- @threads = threads # How many threads to use? Default: 10
19
18
 
20
19
  @option_debug = options[:debug].nil? ? true : !!(options[:debug]) # Debug by default
21
20
  end
22
21
 
23
22
  ###
24
23
  # Start crawling site
25
- # Could take a while. Use threads to speed up crawling and logging to inform user.
24
+ # Could take a while! Use threads to speed up crawling and log to inform user.
26
25
  #
27
- def start
26
+ def start threads_count = 10
28
27
 
29
- crawling_start_notice # Show message on what will happen
30
- threads = [] # Will contain our threads
28
+ crawling_start_notice( @start_url, threads_count ) # Show message on what will happen
29
+ threads = [] # Will contain our n-threads
31
30
  @links_queue = Queue.new # Will contain the links queue that the threads will use
32
31
  @links = [@start_url] # Re-init the links list
33
32
  @crawl_results = [] # Re-init the crawling results
@@ -38,12 +37,12 @@ module SuperCrawler
38
37
  process_page( @start_url )
39
38
 
40
39
  # Create threads to handle new links
41
- @threads.times do # Create many threads
40
+ threads_count.times do # Create threads_count threads
42
41
 
43
- threads << Thread.new do # Add a new threads
42
+ threads << Thread.new do # Instantiate a new threads
44
43
  begin
45
- while current_link = @links_queue.pop(true) # Popping every link after another
46
- process_page( current_link ) # Get links and assets
44
+ while current_link = @links_queue.pop(true) # Pop one link after another
45
+ process_page( current_link ) # Get links and assets of the popped link
47
46
  end
48
47
  rescue ThreadError # Stop when empty links queue
49
48
  end
@@ -52,7 +51,7 @@ module SuperCrawler
52
51
  end
53
52
 
54
53
  threads.map(&:join) # Activate the threads
55
- crawling_summary_notice(start_time, Time.now) if @option_debug # Display crawling summary
54
+ crawling_summary_notice(start_time, Time.now, threads_count) if @option_debug # Display crawling summary
56
55
 
57
56
  return true
58
57
  end
@@ -64,7 +63,7 @@ module SuperCrawler
64
63
  #
65
64
  def render max_pages = 10
66
65
  draw_line
67
- puts "Showing first #{max_links} crawled pages and their contents:\n\n"
66
+ puts "Showing first #{max_pages} crawled pages and their contents:\n\n"
68
67
  @crawl_results[0..(max_pages-1)].each_with_index do |result, index|
69
68
  puts "[#{index+1}] Content of #{result[:url]}\n"
70
69
 
@@ -90,13 +89,13 @@ module SuperCrawler
90
89
  # Process a page by extracting information and updating links queue, links list and results.
91
90
  #
92
91
  def process_page page_url
93
- page = SuperCrawler::CrawlPage.new(page_url) # Crawl the current page
92
+ page = SuperCrawler::Scrap.new(page_url) # Scrap the current page
94
93
 
95
94
  current_page_links = page.get_links # Get current page internal links
96
95
  new_links = current_page_links - @links # Select new links
97
96
 
98
97
  new_links.each { |link| @links_queue.push(link) } # Add new links to the queue
99
- @links += new_links # Add new links to the total links list
98
+ @links += new_links # Add new links to the links list
100
99
  @crawl_results << { # Provide current page crawl result as a hash
101
100
  url: page.url, # The crawled page
102
101
  links: current_page_links, # Its internal links
@@ -109,11 +108,11 @@ module SuperCrawler
109
108
  ###
110
109
  # Display a notice when starting a site crawl
111
110
  #
112
- def crawling_start_notice
111
+ def crawling_start_notice start_url, threads
113
112
  draw_line
114
- puts "Start crawling #{@start_url} using #{@threads} threads. Crawling rules:"
113
+ puts "Start crawling #{start_url} using #{threads} threads. Crawling rules:"
115
114
  puts "1. Keep only internal links"
116
- puts "2. http and https links are considered different"
115
+ puts "2. Links with different scheme are agnored"
117
116
  puts "3. Remove the fragment part from the links (#...)"
118
117
  puts "4. Keep paths with different parameters (?...)"
119
118
  draw_line
@@ -132,11 +131,11 @@ module SuperCrawler
132
131
  ###
133
132
  # Display final crawling summary after site crawling complete
134
133
  #
135
- def crawling_summary_notice time_start, time_end
134
+ def crawling_summary_notice time_start, time_end, threads
136
135
  total_time = time_end - time_start
137
136
  puts ""
138
137
  draw_line
139
- puts "Crawled #{@links.count} links in #{total_time.to_f.to_s} seconds using #{@threads} threads."
138
+ puts "Crawled #{@links.count} links in #{total_time.to_f.to_s} seconds using #{threads} threads."
140
139
  puts "Use .crawl_results to access the crawl results as an array of hashes."
141
140
  puts "Use .render to see the crawl_results as a sitemap."
142
141
  draw_line
@@ -1,20 +1,19 @@
1
1
  require "open-uri"
2
- require "open_uri_redirections"
3
2
  require "nokogiri"
4
3
 
5
4
  module SuperCrawler
6
5
 
7
6
  ###
8
- # Crawl a single HTML page
7
+ # Scrap a single HTML page
9
8
  # Responsible for extracting all relevant information within a page
9
+ # (internal links and assets)
10
10
  #
11
- class CrawlPage
11
+ class Scrap
12
12
 
13
13
  attr_reader :url
14
14
 
15
15
  def initialize url
16
- # Normalize the URL, by adding http(s) if not present in the URL
17
- # NOTA: By default, add http:// scheme to an URL that doesn't have one
16
+ # Normalize the URL, by adding a scheme (http) if not present in the URL
18
17
  @url = URI.encode( !!(url =~ /^(http(s)?:\/\/)/) ? url : ('http://' + url) )
19
18
  end
20
19
 
@@ -28,7 +27,7 @@ module SuperCrawler
28
27
  links = get_doc.css('a').map{ |link| link['href'] }.compact
29
28
 
30
29
  # Select only internal links (relative links, or absolute links with the same host)
31
- links.select!{ |link| URI.parse(URI.encode link).host.nil? || URI.parse(URI.encode link).host == URI.parse(@url).host }
30
+ links.select!{ |link| URI.parse(URI.encode link).host.nil? || link.start_with?( @url ) }
32
31
 
33
32
  # Reject bad matches links (like mailto, tel and javascript)
34
33
  links.reject!{ |link| !!(link =~ /^(mailto:|tel:|javascript:)/) }
@@ -97,9 +96,9 @@ module SuperCrawler
97
96
  #
98
97
  def get_assets
99
98
  {
100
- 'images': get_images,
101
- 'stylesheets': get_stylesheets,
102
- 'scripts': get_scripts
99
+ :'images' => get_images,
100
+ :'stylesheets' => get_stylesheets,
101
+ :'scripts' => get_scripts
103
102
  }
104
103
  end
105
104
 
@@ -109,10 +108,10 @@ module SuperCrawler
109
108
  #
110
109
  def get_all
111
110
  {
112
- 'links': get_links,
113
- 'images': get_images,
114
- 'stylesheets': get_stylesheets,
115
- 'scripts': get_scripts
111
+ :'links' => get_links,
112
+ :'images' => get_images,
113
+ :'stylesheets' => get_stylesheets,
114
+ :'scripts' => get_scripts
116
115
  }
117
116
  end
118
117
 
@@ -131,28 +130,18 @@ module SuperCrawler
131
130
  #
132
131
  def get_doc
133
132
  begin
134
- @doc ||= Nokogiri(open( @url , allow_redirections: :all ))
133
+ @doc ||= Nokogiri(open( @url ))
135
134
  rescue Exception => e
136
135
  raise "Problem with URL #{@url}: #{e}"
137
136
  end
138
137
  end
139
138
 
140
- ###
141
- # Extract the base URL (scheme and host only)
142
- #
143
- # eg:
144
- # http://mysite.com/abc -> http://mysite.com
145
- # https://dev.mysite.co.uk/mylink -> https://dev.mysite.co.uk
146
- def base_url
147
- "#{URI.parse(@url).scheme}://#{URI.parse(@url).host}"
148
- end
149
-
150
139
  ###
151
140
  # Given a URL, return the absolute URL
152
141
  #
153
142
  def create_absolute_url url
154
143
  # Append the base URL (scheme+host) if the provided URL is relative
155
- URI.parse(URI.encode url).host.nil? ? (base_url + url) : url
144
+ URI.parse(URI.encode url).host.nil? ? "#{URI.parse(@url).scheme}://#{URI.parse(@url).host}#{url}" : url
156
145
  end
157
146
 
158
147
  end
@@ -1,3 +1,3 @@
1
1
  module SuperCrawler
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
data/lib/super_crawler.rb CHANGED
@@ -1,4 +1,4 @@
1
1
  require "super_crawler/version"
2
2
 
3
- require "super_crawler/crawl_page"
4
- require "super_crawler/crawl_site"
3
+ require "super_crawler/scrap"
4
+ require "super_crawler/crawl"
@@ -28,7 +28,6 @@ Gem::Specification.new do |spec|
28
28
  spec.require_paths = ["lib"]
29
29
 
30
30
  spec.add_dependency "nokogiri", "~> 1"
31
- spec.add_dependency "open_uri_redirections", "~> 0.2"
32
31
  spec.add_dependency "thread", "~> 0.2"
33
32
 
34
33
  spec.add_development_dependency "bundler", "~> 1.10"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: super_crawler
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Hassen Taidirt
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-07-09 00:00:00.000000000 Z
11
+ date: 2016-07-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -24,20 +24,6 @@ dependencies:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '1'
27
- - !ruby/object:Gem::Dependency
28
- name: open_uri_redirections
29
- requirement: !ruby/object:Gem::Requirement
30
- requirements:
31
- - - "~>"
32
- - !ruby/object:Gem::Version
33
- version: '0.2'
34
- type: :runtime
35
- prerelease: false
36
- version_requirements: !ruby/object:Gem::Requirement
37
- requirements:
38
- - - "~>"
39
- - !ruby/object:Gem::Version
40
- version: '0.2'
41
27
  - !ruby/object:Gem::Dependency
42
28
  name: thread
43
29
  requirement: !ruby/object:Gem::Requirement
@@ -113,8 +99,8 @@ files:
113
99
  - bin/console
114
100
  - bin/setup
115
101
  - lib/super_crawler.rb
116
- - lib/super_crawler/crawl_page.rb
117
- - lib/super_crawler/crawl_site.rb
102
+ - lib/super_crawler/crawl.rb
103
+ - lib/super_crawler/scrap.rb
118
104
  - lib/super_crawler/version.rb
119
105
  - super_crawler.gemspec
120
106
  homepage: https://github.com/htaidirt/super_crawler