cobweb 0.0.5 → 0.0.6
Sign up to get free protection for your applications and to get access to all the features.
- data/README.textile +35 -3
- metadata +3 -3
data/README.textile
CHANGED
@@ -1,9 +1,31 @@
|
|
1
1
|
|
2
|
-
h1. Cobweb v0.0.
|
2
|
+
h1. Cobweb v0.0.6
|
3
3
|
|
4
4
|
h2. Intro
|
5
|
-
|
6
|
-
|
5
|
+
|
6
|
+
CobWeb has two functions. Firstly it is a http client that allows get and head requests returning a hash of data relating to the requested resource. The second main function is to utilize this combined with the power of Resque to cluster the crawls allowing you crawl quickly.
|
7
|
+
|
8
|
+
When running on resque, passing in a Class and queue name it will enqueue all resources to this queue for processing, passing in the hash it has generated. You then implement the perform method to process the resource for your own application.
|
9
|
+
|
10
|
+
The data available in the returned hash are:
|
11
|
+
|
12
|
+
* :url - url of the resource requested
|
13
|
+
* :status_code - status code of the resource requested
|
14
|
+
* :mime_type - content type of the resource
|
15
|
+
* :character_set - character set of content determined from content type
|
16
|
+
* :length - length of the content returned
|
17
|
+
* :body - content of the resource
|
18
|
+
* :location - location header if returned
|
19
|
+
* :redirect_through - if your following redirects, any redirects are stored here detailing where you were redirected through to get to the final location
|
20
|
+
* :headers - hash or the headers returned
|
21
|
+
* :links - hash or links on the page split in to types
|
22
|
+
** :links - url's from a tags within the resource
|
23
|
+
** :images - url's from img tags within the resource
|
24
|
+
** :related - url's from link tags
|
25
|
+
** :scripts - url's from script tags
|
26
|
+
** :styles - url's from within link tags with rel of stylesheet and from url() directives with stylesheets
|
27
|
+
|
28
|
+
The source for the links can be overridden, contact me for the syntax (don't have time to put it into this documentation, will as soon as i have time!)
|
7
29
|
|
8
30
|
h2. Installation
|
9
31
|
|
@@ -25,6 +47,8 @@ Creates a new crawler object based on a base_url
|
|
25
47
|
** :debug - enables debug output (Default: false)
|
26
48
|
** :quiet - hides default output (Default: false)
|
27
49
|
** :cache - sets the ttl for caching pages, set to nil to disable caching (Default: 300)
|
50
|
+
** :timeout - http timeout for requests (Default: 10)
|
51
|
+
** :redis_options - hash containing the initialization options for redis (e.g. {:host => "redis.mydomain.com"}
|
28
52
|
|
29
53
|
bq. crawler = CobWeb.new(:follow_redirects => false)
|
30
54
|
|
@@ -44,6 +68,14 @@ Simple get that obey's the options supplied in new.
|
|
44
68
|
|
45
69
|
bq. crawler.get("http://www.google.com/")
|
46
70
|
|
71
|
+
h4. head(url)
|
72
|
+
|
73
|
+
Simple get that obey's the options supplied in new.
|
74
|
+
|
75
|
+
* url - url requested
|
76
|
+
|
77
|
+
bq. crawler.head("http://www.google.com/")
|
78
|
+
|
47
79
|
|
48
80
|
h2. License
|
49
81
|
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cobweb
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 19
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
8
|
- 0
|
9
|
-
-
|
10
|
-
version: 0.0.
|
9
|
+
- 6
|
10
|
+
version: 0.0.6
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- Stewart McKee
|