wombat 1.0.0 → 2.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +13 -30
- data/Rakefile +1 -1
- data/VERSION +1 -1
- data/fixtures/vcr_cassettes/follow_links.yml +2143 -0
- data/lib/wombat/crawler.rb +7 -17
- data/lib/wombat/dsl/follower.rb +19 -0
- data/lib/wombat/dsl/iterator.rb +19 -0
- data/lib/wombat/dsl/metadata.rb +27 -0
- data/lib/wombat/dsl/property.rb +27 -0
- data/lib/wombat/dsl/property_group.rb +48 -0
- data/lib/wombat/processing/node_selector.rb +12 -0
- data/lib/wombat/processing/parser.rb +48 -0
- data/lib/wombat/property/locators/base.rb +33 -0
- data/lib/wombat/property/locators/factory.rb +39 -0
- data/lib/wombat/property/locators/follow.rb +25 -0
- data/lib/wombat/property/locators/html.rb +14 -0
- data/lib/wombat/property/locators/iterator.rb +23 -0
- data/lib/wombat/property/locators/list.rb +17 -0
- data/lib/wombat/property/locators/property_group.rb +20 -0
- data/lib/wombat/property/locators/text.rb +22 -0
- data/lib/wombat.rb +8 -4
- data/spec/crawler_spec.rb +38 -48
- data/spec/dsl/property_spec.rb +12 -0
- data/spec/helpers/sample_crawler.rb +2 -15
- data/spec/integration/integration_spec.rb +61 -33
- data/spec/processing/parser_spec.rb +32 -0
- data/spec/property/locators/factory_spec.rb +18 -0
- data/spec/property/locators/follow_spec.rb +4 -0
- data/spec/property/locators/html_spec.rb +15 -0
- data/spec/property/locators/iterator_spec.rb +4 -0
- data/spec/property/locators/list_spec.rb +13 -0
- data/spec/property/locators/text_spec.rb +49 -0
- data/spec/sample_crawler_spec.rb +7 -11
- data/spec/wombat_spec.rb +13 -1
- data/wombat.gemspec +27 -16
- metadata +27 -16
- data/lib/wombat/iterator.rb +0 -38
- data/lib/wombat/metadata.rb +0 -24
- data/lib/wombat/node_selector.rb +0 -10
- data/lib/wombat/parser.rb +0 -59
- data/lib/wombat/property.rb +0 -21
- data/lib/wombat/property_container.rb +0 -70
- data/lib/wombat/property_locator.rb +0 -20
- data/spec/iterator_spec.rb +0 -52
- data/spec/metadata_spec.rb +0 -20
- data/spec/parser_spec.rb +0 -125
- data/spec/property_container_spec.rb +0 -62
- data/spec/property_locator_spec.rb +0 -75
- data/spec/property_spec.rb +0 -16
data/README.md
CHANGED
@@ -1,11 +1,12 @@
|
|
1
1
|
# Wombat
|
2
2
|
|
3
|
-
[![CI Build Status](https://secure.travis-ci.org/felipecsl/wombat.png?branch=master)][travis] [![Dependency Status](https://gemnasium.com/felipecsl/wombat.png?travis)][gemnasium]
|
3
|
+
[![CI Build Status](https://secure.travis-ci.org/felipecsl/wombat.png?branch=master)][travis] [![Dependency Status](https://gemnasium.com/felipecsl/wombat.png?travis)][gemnasium] [![Code Climate](https://codeclimate.com/badge.png)][codeclimate]
|
4
4
|
|
5
5
|
[travis]: http://travis-ci.org/felipecsl/wombat
|
6
6
|
[gemnasium]: https://gemnasium.com/felipecsl/wombat
|
7
|
+
[codeclimate]: https://codeclimate.com/github/felipecsl/wombat
|
7
8
|
|
8
|
-
|
9
|
+
Web scraper with an elegant DSL that parses structured data from web pages.
|
9
10
|
|
10
11
|
## Usage:
|
11
12
|
|
@@ -13,20 +14,20 @@ Generic Web crawler with an elegant DSL that parses structured data from web pag
|
|
13
14
|
|
14
15
|
Obs: Requires ruby 1.9
|
15
16
|
|
16
|
-
##
|
17
|
+
## Scraping a page:
|
17
18
|
|
18
19
|
The simplest way to use Wombat is by calling ``Wombat.crawl`` and passing it a block:
|
19
20
|
|
20
21
|
```ruby
|
21
22
|
|
22
|
-
# =>
|
23
|
+
# => github_scraper.rb
|
23
24
|
|
24
25
|
#coding: utf-8
|
25
26
|
require 'wombat'
|
26
27
|
|
27
28
|
Wombat.crawl do
|
28
29
|
base_url "http://www.github.com"
|
29
|
-
|
30
|
+
path "/"
|
30
31
|
|
31
32
|
headline "xpath=//h1"
|
32
33
|
|
@@ -36,11 +37,11 @@ Wombat.crawl do
|
|
36
37
|
e.gsub(/Explore/, "LOVE")
|
37
38
|
end
|
38
39
|
|
39
|
-
benefits do
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
40
|
+
benefits do
|
41
|
+
first_benefit "css=.column.leftmost h3"
|
42
|
+
second_benefir "css=.column.leftmid h3"
|
43
|
+
third_benefit "css=.column.rightmid h3"
|
44
|
+
fourth_benefit "css=.column.rightmost h3"
|
44
45
|
end
|
45
46
|
end
|
46
47
|
```
|
@@ -62,7 +63,8 @@ end
|
|
62
63
|
```
|
63
64
|
|
64
65
|
### This is just a sneak peek of what Wombat can do. For the complete documentation, please check the [project Wiki](http://github.com/felipecsl/wombat/wiki).
|
65
|
-
### [API Documentation](http://rubydoc.info/gems/wombat/0.
|
66
|
+
### [API Documentation](http://rubydoc.info/gems/wombat/1.0.0/frames)
|
67
|
+
### [Changelog](https://github.com/felipecsl/wombat/wiki/Changelog)
|
66
68
|
|
67
69
|
|
68
70
|
## Contributing to Wombat
|
@@ -81,25 +83,6 @@ end
|
|
81
83
|
* Daniel Naves de Carvalho ([@danielnc](https://github.com/danielnc))
|
82
84
|
* [@sigi](https://github.com/sigi)
|
83
85
|
|
84
|
-
## Changelog
|
85
|
-
|
86
|
-
### version 1.0.0
|
87
|
-
|
88
|
-
* Breaking change: Metadata#format renamed to Metadata#document_format due to method name clash with [Kernel#format](http://www.ruby-doc.org/core-1.9.3/Kernel.html#method-i-format)
|
89
|
-
|
90
|
-
### version 0.5.0
|
91
|
-
|
92
|
-
* [Fixed a bug on malformed selectors](https://github.com/felipecsl/wombat/commit/e0f4eec20e1e2bb07a1813a1edd019933edeceaa)
|
93
|
-
* [Fixed a bug where multiple calls to #crawl would not clean up previously iterated array results and yield repeated results](https://github.com/felipecsl/wombat/commit/40b09a5bf8b9ba08aa51b6f41f706b7c3c4e4252)
|
94
|
-
|
95
|
-
### version 0.4.0
|
96
|
-
|
97
|
-
* Added utility method ``Wombat.crawl`` that eliminates the need to have a ruby class instance to use Wombat. Now you can use just ``Wombat.crawl`` and start working. The class based format still works as before though.
|
98
|
-
|
99
|
-
### version 0.3.1
|
100
|
-
|
101
|
-
* Added the ability to provide a block to Crawler#crawl and override the default crawler properties for a one off run (thanks to @danielnc)
|
102
|
-
|
103
86
|
## Copyright
|
104
87
|
|
105
88
|
Copyright (c) 2012 Felipe Lima. See LICENSE.txt for further details.
|
data/Rakefile
CHANGED
@@ -12,7 +12,7 @@ Jeweler::Tasks.new do |gem|
|
|
12
12
|
gem.name = "wombat"
|
13
13
|
gem.homepage = "http://github.com/felipecsl/wombat"
|
14
14
|
gem.license = "MIT"
|
15
|
-
gem.summary = %Q{Ruby DSL to
|
15
|
+
gem.summary = %Q{Ruby DSL to scrape web pages}
|
16
16
|
gem.description = %Q{Generic Web crawler with a DSL that parses structured data from web pages}
|
17
17
|
gem.email = "felipe.lima@gmail.com"
|
18
18
|
gem.authors = ["Felipe Lima"]
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
|
1
|
+
2.0.0
|