cobweb 0.0.17 → 0.0.18
Sign up to get free protection for your applications and to get access to all the features.
- data/README.textile +12 -2
- data/lib/cobweb.rb +1 -2
- data/lib/stats.rb +0 -1
- metadata +19 -19
data/README.textile
CHANGED
@@ -1,14 +1,24 @@
|
|
1
1
|
|
2
|
-
h1. Cobweb v0.0.
|
2
|
+
h1. Cobweb v0.0.18
|
3
3
|
|
4
4
|
h2. Intro
|
5
5
|
|
6
6
|
CobWeb has three methods of running. Firstly it is a http client that allows get and head requests returning a hash of data relating to the requested resource. The second main function is to utilize this combined with the power of Resque to cluster the crawls allowing you crawl quickly. Lastly you can run the crawler with a block that uses each of the pages found in the crawl.
|
7
7
|
|
8
|
+
h3. Resque
|
9
|
+
|
8
10
|
When running on resque, passing in a Class and queue name it will enqueue all resources to this queue for processing, passing in the hash it has generated. You then implement the perform method to process the resource for your own application.
|
9
11
|
|
10
|
-
|
12
|
+
h3. Standalone
|
11
13
|
|
14
|
+
CobwebCrawler takes the same options as cobweb itself, so you can use any of the options available for that. An example is listed below.
|
15
|
+
|
16
|
+
bq. crawler = CobwebCrawler.new(:cache => 600);
|
17
|
+
bq. stats = crawler.crawl("http://www.pepsico.com")
|
18
|
+
|
19
|
+
While the crawler is running, you can view statistics on http://localhost:4567
|
20
|
+
|
21
|
+
h3. Data Returned
|
12
22
|
The data available in the returned hash are:
|
13
23
|
|
14
24
|
* :url - url of the resource requested
|
data/lib/cobweb.rb
CHANGED
@@ -1,5 +1,4 @@
|
|
1
1
|
require 'rubygems'
|
2
|
-
require 'bundler/setup'
|
3
2
|
require 'uri'
|
4
3
|
require 'resque'
|
5
4
|
require "addressable/uri"
|
@@ -75,7 +74,7 @@ class Cobweb
|
|
75
74
|
|
76
75
|
# retrieve data
|
77
76
|
unless @http && @http.address == uri.host && @http.port == uri.inferred_port
|
78
|
-
puts "Creating connection to #{uri.host}..."
|
77
|
+
puts "Creating connection to #{uri.host}..." unless @options[:quiet]
|
79
78
|
@http = Net::HTTP.new(uri.host, uri.inferred_port)
|
80
79
|
end
|
81
80
|
if uri.scheme == "https"
|
data/lib/stats.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cobweb
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.18
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -13,7 +13,7 @@ date: 2012-03-04 00:00:00.000000000 Z
|
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: resque
|
16
|
-
requirement: &
|
16
|
+
requirement: &70110522428200 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,10 +21,10 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70110522428200
|
25
25
|
- !ruby/object:Gem::Dependency
|
26
26
|
name: redis
|
27
|
-
requirement: &
|
27
|
+
requirement: &70110522427560 !ruby/object:Gem::Requirement
|
28
28
|
none: false
|
29
29
|
requirements:
|
30
30
|
- - ! '>='
|
@@ -32,10 +32,10 @@ dependencies:
|
|
32
32
|
version: '0'
|
33
33
|
type: :runtime
|
34
34
|
prerelease: false
|
35
|
-
version_requirements: *
|
35
|
+
version_requirements: *70110522427560
|
36
36
|
- !ruby/object:Gem::Dependency
|
37
37
|
name: absolutize
|
38
|
-
requirement: &
|
38
|
+
requirement: &70110522427020 !ruby/object:Gem::Requirement
|
39
39
|
none: false
|
40
40
|
requirements:
|
41
41
|
- - ! '>='
|
@@ -43,10 +43,10 @@ dependencies:
|
|
43
43
|
version: '0'
|
44
44
|
type: :runtime
|
45
45
|
prerelease: false
|
46
|
-
version_requirements: *
|
46
|
+
version_requirements: *70110522427020
|
47
47
|
- !ruby/object:Gem::Dependency
|
48
48
|
name: nokogiri
|
49
|
-
requirement: &
|
49
|
+
requirement: &70110522426360 !ruby/object:Gem::Requirement
|
50
50
|
none: false
|
51
51
|
requirements:
|
52
52
|
- - ! '>='
|
@@ -54,10 +54,10 @@ dependencies:
|
|
54
54
|
version: '0'
|
55
55
|
type: :runtime
|
56
56
|
prerelease: false
|
57
|
-
version_requirements: *
|
57
|
+
version_requirements: *70110522426360
|
58
58
|
- !ruby/object:Gem::Dependency
|
59
59
|
name: addressable
|
60
|
-
requirement: &
|
60
|
+
requirement: &70110522425700 !ruby/object:Gem::Requirement
|
61
61
|
none: false
|
62
62
|
requirements:
|
63
63
|
- - ! '>='
|
@@ -65,10 +65,10 @@ dependencies:
|
|
65
65
|
version: '0'
|
66
66
|
type: :runtime
|
67
67
|
prerelease: false
|
68
|
-
version_requirements: *
|
68
|
+
version_requirements: *70110522425700
|
69
69
|
- !ruby/object:Gem::Dependency
|
70
70
|
name: rspec
|
71
|
-
requirement: &
|
71
|
+
requirement: &70110522424920 !ruby/object:Gem::Requirement
|
72
72
|
none: false
|
73
73
|
requirements:
|
74
74
|
- - ! '>='
|
@@ -76,10 +76,10 @@ dependencies:
|
|
76
76
|
version: '0'
|
77
77
|
type: :runtime
|
78
78
|
prerelease: false
|
79
|
-
version_requirements: *
|
79
|
+
version_requirements: *70110522424920
|
80
80
|
- !ruby/object:Gem::Dependency
|
81
81
|
name: awesome_print
|
82
|
-
requirement: &
|
82
|
+
requirement: &70110522424300 !ruby/object:Gem::Requirement
|
83
83
|
none: false
|
84
84
|
requirements:
|
85
85
|
- - ! '>='
|
@@ -87,10 +87,10 @@ dependencies:
|
|
87
87
|
version: '0'
|
88
88
|
type: :runtime
|
89
89
|
prerelease: false
|
90
|
-
version_requirements: *
|
90
|
+
version_requirements: *70110522424300
|
91
91
|
- !ruby/object:Gem::Dependency
|
92
92
|
name: sinatra
|
93
|
-
requirement: &
|
93
|
+
requirement: &70110522421640 !ruby/object:Gem::Requirement
|
94
94
|
none: false
|
95
95
|
requirements:
|
96
96
|
- - ! '>='
|
@@ -98,10 +98,10 @@ dependencies:
|
|
98
98
|
version: '0'
|
99
99
|
type: :runtime
|
100
100
|
prerelease: false
|
101
|
-
version_requirements: *
|
101
|
+
version_requirements: *70110522421640
|
102
102
|
- !ruby/object:Gem::Dependency
|
103
103
|
name: thin
|
104
|
-
requirement: &
|
104
|
+
requirement: &70110522421140 !ruby/object:Gem::Requirement
|
105
105
|
none: false
|
106
106
|
requirements:
|
107
107
|
- - ! '>='
|
@@ -109,7 +109,7 @@ dependencies:
|
|
109
109
|
version: '0'
|
110
110
|
type: :runtime
|
111
111
|
prerelease: false
|
112
|
-
version_requirements: *
|
112
|
+
version_requirements: *70110522421140
|
113
113
|
description:
|
114
114
|
email: stewart@rockwellcottage.com
|
115
115
|
executables: []
|