wombat 1.0.0 → 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (49) hide show
  1. data/README.md +13 -30
  2. data/Rakefile +1 -1
  3. data/VERSION +1 -1
  4. data/fixtures/vcr_cassettes/follow_links.yml +2143 -0
  5. data/lib/wombat/crawler.rb +7 -17
  6. data/lib/wombat/dsl/follower.rb +19 -0
  7. data/lib/wombat/dsl/iterator.rb +19 -0
  8. data/lib/wombat/dsl/metadata.rb +27 -0
  9. data/lib/wombat/dsl/property.rb +27 -0
  10. data/lib/wombat/dsl/property_group.rb +48 -0
  11. data/lib/wombat/processing/node_selector.rb +12 -0
  12. data/lib/wombat/processing/parser.rb +48 -0
  13. data/lib/wombat/property/locators/base.rb +33 -0
  14. data/lib/wombat/property/locators/factory.rb +39 -0
  15. data/lib/wombat/property/locators/follow.rb +25 -0
  16. data/lib/wombat/property/locators/html.rb +14 -0
  17. data/lib/wombat/property/locators/iterator.rb +23 -0
  18. data/lib/wombat/property/locators/list.rb +17 -0
  19. data/lib/wombat/property/locators/property_group.rb +20 -0
  20. data/lib/wombat/property/locators/text.rb +22 -0
  21. data/lib/wombat.rb +8 -4
  22. data/spec/crawler_spec.rb +38 -48
  23. data/spec/dsl/property_spec.rb +12 -0
  24. data/spec/helpers/sample_crawler.rb +2 -15
  25. data/spec/integration/integration_spec.rb +61 -33
  26. data/spec/processing/parser_spec.rb +32 -0
  27. data/spec/property/locators/factory_spec.rb +18 -0
  28. data/spec/property/locators/follow_spec.rb +4 -0
  29. data/spec/property/locators/html_spec.rb +15 -0
  30. data/spec/property/locators/iterator_spec.rb +4 -0
  31. data/spec/property/locators/list_spec.rb +13 -0
  32. data/spec/property/locators/text_spec.rb +49 -0
  33. data/spec/sample_crawler_spec.rb +7 -11
  34. data/spec/wombat_spec.rb +13 -1
  35. data/wombat.gemspec +27 -16
  36. metadata +27 -16
  37. data/lib/wombat/iterator.rb +0 -38
  38. data/lib/wombat/metadata.rb +0 -24
  39. data/lib/wombat/node_selector.rb +0 -10
  40. data/lib/wombat/parser.rb +0 -59
  41. data/lib/wombat/property.rb +0 -21
  42. data/lib/wombat/property_container.rb +0 -70
  43. data/lib/wombat/property_locator.rb +0 -20
  44. data/spec/iterator_spec.rb +0 -52
  45. data/spec/metadata_spec.rb +0 -20
  46. data/spec/parser_spec.rb +0 -125
  47. data/spec/property_container_spec.rb +0 -62
  48. data/spec/property_locator_spec.rb +0 -75
  49. data/spec/property_spec.rb +0 -16
data/README.md CHANGED
@@ -1,11 +1,12 @@
1
1
  # Wombat
2
2
 
3
- [![CI Build Status](https://secure.travis-ci.org/felipecsl/wombat.png?branch=master)][travis] [![Dependency Status](https://gemnasium.com/felipecsl/wombat.png?travis)][gemnasium]
3
+ [![CI Build Status](https://secure.travis-ci.org/felipecsl/wombat.png?branch=master)][travis] [![Dependency Status](https://gemnasium.com/felipecsl/wombat.png?travis)][gemnasium] [![Code Climate](https://codeclimate.com/badge.png)][codeclimate]
4
4
 
5
5
  [travis]: http://travis-ci.org/felipecsl/wombat
6
6
  [gemnasium]: https://gemnasium.com/felipecsl/wombat
7
+ [codeclimate]: https://codeclimate.com/github/felipecsl/wombat
7
8
 
8
- Generic Web crawler with an elegant DSL that parses structured data from web pages.
9
+ Web scraper with an elegant DSL that parses structured data from web pages.
9
10
 
10
11
  ## Usage:
11
12
 
@@ -13,20 +14,20 @@ Generic Web crawler with an elegant DSL that parses structured data from web pag
13
14
 
14
15
  Obs: Requires ruby 1.9
15
16
 
16
- ## Crawling a page:
17
+ ## Scraping a page:
17
18
 
18
19
  The simplest way to use Wombat is by calling ``Wombat.crawl`` and passing it a block:
19
20
 
20
21
  ```ruby
21
22
 
22
- # => github_crawler.rb
23
+ # => github_scraper.rb
23
24
 
24
25
  #coding: utf-8
25
26
  require 'wombat'
26
27
 
27
28
  Wombat.crawl do
28
29
  base_url "http://www.github.com"
29
- list_page "/"
30
+ path "/"
30
31
 
31
32
  headline "xpath=//h1"
32
33
 
@@ -36,11 +37,11 @@ Wombat.crawl do
36
37
  e.gsub(/Explore/, "LOVE")
37
38
  end
38
39
 
39
- benefits do |b|
40
- b.first_benefit "css=.column.leftmost h3"
41
- b.second_benefir "css=.column.leftmid h3"
42
- b.third_benefit "css=.column.rightmid h3"
43
- b.fourth_benefit "css=.column.rightmost h3"
40
+ benefits do
41
+ first_benefit "css=.column.leftmost h3"
42
+ second_benefir "css=.column.leftmid h3"
43
+ third_benefit "css=.column.rightmid h3"
44
+ fourth_benefit "css=.column.rightmost h3"
44
45
  end
45
46
  end
46
47
  ```
@@ -62,7 +63,8 @@ end
62
63
  ```
63
64
 
64
65
  ### This is just a sneak peek of what Wombat can do. For the complete documentation, please check the [project Wiki](http://github.com/felipecsl/wombat/wiki).
65
- ### [API Documentation](http://rubydoc.info/gems/wombat/0.5.0/frames).
66
+ ### [API Documentation](http://rubydoc.info/gems/wombat/1.0.0/frames)
67
+ ### [Changelog](https://github.com/felipecsl/wombat/wiki/Changelog)
66
68
 
67
69
 
68
70
  ## Contributing to Wombat
@@ -81,25 +83,6 @@ end
81
83
  * Daniel Naves de Carvalho ([@danielnc](https://github.com/danielnc))
82
84
  * [@sigi](https://github.com/sigi)
83
85
 
84
- ## Changelog
85
-
86
- ### version 1.0.0
87
-
88
- * Breaking change: Metadata#format renamed to Metadata#document_format due to method name clash with [Kernel#format](http://www.ruby-doc.org/core-1.9.3/Kernel.html#method-i-format)
89
-
90
- ### version 0.5.0
91
-
92
- * [Fixed a bug on malformed selectors](https://github.com/felipecsl/wombat/commit/e0f4eec20e1e2bb07a1813a1edd019933edeceaa)
93
- * [Fixed a bug where multiple calls to #crawl would not clean up previously iterated array results and yield repeated results](https://github.com/felipecsl/wombat/commit/40b09a5bf8b9ba08aa51b6f41f706b7c3c4e4252)
94
-
95
- ### version 0.4.0
96
-
97
- * Added utility method ``Wombat.crawl`` that eliminates the need to have a ruby class instance to use Wombat. Now you can use just ``Wombat.crawl`` and start working. The class based format still works as before though.
98
-
99
- ### version 0.3.1
100
-
101
- * Added the ability to provide a block to Crawler#crawl and override the default crawler properties for a one off run (thanks to @danielnc)
102
-
103
86
  ## Copyright
104
87
 
105
88
  Copyright (c) 2012 Felipe Lima. See LICENSE.txt for further details.
data/Rakefile CHANGED
@@ -12,7 +12,7 @@ Jeweler::Tasks.new do |gem|
12
12
  gem.name = "wombat"
13
13
  gem.homepage = "http://github.com/felipecsl/wombat"
14
14
  gem.license = "MIT"
15
- gem.summary = %Q{Ruby DSL to crawl web pages}
15
+ gem.summary = %Q{Ruby DSL to scrape web pages}
16
16
  gem.description = %Q{Generic Web crawler with a DSL that parses structured data from web pages}
17
17
  gem.email = "felipe.lima@gmail.com"
18
18
  gem.authors = ["Felipe Lima"]
data/VERSION CHANGED
@@ -1 +1 @@
1
- 1.0.0
1
+ 2.0.0