cobweb 0.0.5 → 0.0.6

Sign up to get free protection for your applications and to get access to all the features.
Files changed (2) hide show
  1. data/README.textile +35 -3
  2. metadata +3 -3
@@ -1,9 +1,31 @@
1
1
 
2
- h1. Cobweb v0.0.3
2
+ h1. Cobweb v0.0.6
3
3
 
4
4
  h2. Intro
5
-
6
- Crawler that utilises resque jobs to perform the crawl allowing clustering of crawls and redis to perform caching of responses. It currently requires redis on the same machine but will add options to specify redis server shortly.
5
+
6
+ CobWeb has two functions. Firstly it is a http client that allows get and head requests returning a hash of data relating to the requested resource. The second main function is to utilize this combined with the power of Resque to cluster the crawls allowing you crawl quickly.
7
+
8
+ When running on resque, passing in a Class and queue name it will enqueue all resources to this queue for processing, passing in the hash it has generated. You then implement the perform method to process the resource for your own application.
9
+
10
+ The data available in the returned hash are:
11
+
12
+ * :url - url of the resource requested
13
+ * :status_code - status code of the resource requested
14
+ * :mime_type - content type of the resource
15
+ * :character_set - character set of content determined from content type
16
+ * :length - length of the content returned
17
+ * :body - content of the resource
18
+ * :location - location header if returned
19
+ * :redirect_through - if your following redirects, any redirects are stored here detailing where you were redirected through to get to the final location
20
+ * :headers - hash or the headers returned
21
+ * :links - hash or links on the page split in to types
22
+ ** :links - url's from a tags within the resource
23
+ ** :images - url's from img tags within the resource
24
+ ** :related - url's from link tags
25
+ ** :scripts - url's from script tags
26
+ ** :styles - url's from within link tags with rel of stylesheet and from url() directives with stylesheets
27
+
28
+ The source for the links can be overridden, contact me for the syntax (don't have time to put it into this documentation, will as soon as i have time!)
7
29
 
8
30
  h2. Installation
9
31
 
@@ -25,6 +47,8 @@ Creates a new crawler object based on a base_url
25
47
  ** :debug - enables debug output (Default: false)
26
48
  ** :quiet - hides default output (Default: false)
27
49
  ** :cache - sets the ttl for caching pages, set to nil to disable caching (Default: 300)
50
+ ** :timeout - http timeout for requests (Default: 10)
51
+ ** :redis_options - hash containing the initialization options for redis (e.g. {:host => "redis.mydomain.com"}
28
52
 
29
53
  bq. crawler = CobWeb.new(:follow_redirects => false)
30
54
 
@@ -44,6 +68,14 @@ Simple get that obey's the options supplied in new.
44
68
 
45
69
  bq. crawler.get("http://www.google.com/")
46
70
 
71
+ h4. head(url)
72
+
73
+ Simple get that obey's the options supplied in new.
74
+
75
+ * url - url requested
76
+
77
+ bq. crawler.head("http://www.google.com/")
78
+
47
79
 
48
80
  h2. License
49
81
 
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cobweb
3
3
  version: !ruby/object:Gem::Version
4
- hash: 21
4
+ hash: 19
5
5
  prerelease: false
6
6
  segments:
7
7
  - 0
8
8
  - 0
9
- - 5
10
- version: 0.0.5
9
+ - 6
10
+ version: 0.0.6
11
11
  platform: ruby
12
12
  authors:
13
13
  - Stewart McKee