wombat 1.0.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. data/README.md +13 -30
  2. data/Rakefile +1 -1
  3. data/VERSION +1 -1
  4. data/fixtures/vcr_cassettes/follow_links.yml +2143 -0
  5. data/lib/wombat/crawler.rb +7 -17
  6. data/lib/wombat/dsl/follower.rb +19 -0
  7. data/lib/wombat/dsl/iterator.rb +19 -0
  8. data/lib/wombat/dsl/metadata.rb +27 -0
  9. data/lib/wombat/dsl/property.rb +27 -0
  10. data/lib/wombat/dsl/property_group.rb +48 -0
  11. data/lib/wombat/processing/node_selector.rb +12 -0
  12. data/lib/wombat/processing/parser.rb +48 -0
  13. data/lib/wombat/property/locators/base.rb +33 -0
  14. data/lib/wombat/property/locators/factory.rb +39 -0
  15. data/lib/wombat/property/locators/follow.rb +25 -0
  16. data/lib/wombat/property/locators/html.rb +14 -0
  17. data/lib/wombat/property/locators/iterator.rb +23 -0
  18. data/lib/wombat/property/locators/list.rb +17 -0
  19. data/lib/wombat/property/locators/property_group.rb +20 -0
  20. data/lib/wombat/property/locators/text.rb +22 -0
  21. data/lib/wombat.rb +8 -4
  22. data/spec/crawler_spec.rb +38 -48
  23. data/spec/dsl/property_spec.rb +12 -0
  24. data/spec/helpers/sample_crawler.rb +2 -15
  25. data/spec/integration/integration_spec.rb +61 -33
  26. data/spec/processing/parser_spec.rb +32 -0
  27. data/spec/property/locators/factory_spec.rb +18 -0
  28. data/spec/property/locators/follow_spec.rb +4 -0
  29. data/spec/property/locators/html_spec.rb +15 -0
  30. data/spec/property/locators/iterator_spec.rb +4 -0
  31. data/spec/property/locators/list_spec.rb +13 -0
  32. data/spec/property/locators/text_spec.rb +49 -0
  33. data/spec/sample_crawler_spec.rb +7 -11
  34. data/spec/wombat_spec.rb +13 -1
  35. data/wombat.gemspec +27 -16
  36. metadata +27 -16
  37. data/lib/wombat/iterator.rb +0 -38
  38. data/lib/wombat/metadata.rb +0 -24
  39. data/lib/wombat/node_selector.rb +0 -10
  40. data/lib/wombat/parser.rb +0 -59
  41. data/lib/wombat/property.rb +0 -21
  42. data/lib/wombat/property_container.rb +0 -70
  43. data/lib/wombat/property_locator.rb +0 -20
  44. data/spec/iterator_spec.rb +0 -52
  45. data/spec/metadata_spec.rb +0 -20
  46. data/spec/parser_spec.rb +0 -125
  47. data/spec/property_container_spec.rb +0 -62
  48. data/spec/property_locator_spec.rb +0 -75
  49. data/spec/property_spec.rb +0 -16
data/README.md CHANGED
@@ -1,11 +1,12 @@
1
1
  # Wombat
2
2
 
3
- [![CI Build Status](https://secure.travis-ci.org/felipecsl/wombat.png?branch=master)][travis] [![Dependency Status](https://gemnasium.com/felipecsl/wombat.png?travis)][gemnasium]
3
+ [![CI Build Status](https://secure.travis-ci.org/felipecsl/wombat.png?branch=master)][travis] [![Dependency Status](https://gemnasium.com/felipecsl/wombat.png?travis)][gemnasium] [![Code Climate](https://codeclimate.com/badge.png)][codeclimate]
4
4
 
5
5
  [travis]: http://travis-ci.org/felipecsl/wombat
6
6
  [gemnasium]: https://gemnasium.com/felipecsl/wombat
7
+ [codeclimate]: https://codeclimate.com/github/felipecsl/wombat
7
8
 
8
- Generic Web crawler with an elegant DSL that parses structured data from web pages.
9
+ Web scraper with an elegant DSL that parses structured data from web pages.
9
10
 
10
11
  ## Usage:
11
12
 
@@ -13,20 +14,20 @@ Generic Web crawler with an elegant DSL that parses structured data from web pag
13
14
 
14
15
  Obs: Requires ruby 1.9
15
16
 
16
- ## Crawling a page:
17
+ ## Scraping a page:
17
18
 
18
19
  The simplest way to use Wombat is by calling ``Wombat.crawl`` and passing it a block:
19
20
 
20
21
  ```ruby
21
22
 
22
- # => github_crawler.rb
23
+ # => github_scraper.rb
23
24
 
24
25
  #coding: utf-8
25
26
  require 'wombat'
26
27
 
27
28
  Wombat.crawl do
28
29
  base_url "http://www.github.com"
29
- list_page "/"
30
+ path "/"
30
31
 
31
32
  headline "xpath=//h1"
32
33
 
@@ -36,11 +37,11 @@ Wombat.crawl do
36
37
  e.gsub(/Explore/, "LOVE")
37
38
  end
38
39
 
39
- benefits do |b|
40
- b.first_benefit "css=.column.leftmost h3"
41
- b.second_benefir "css=.column.leftmid h3"
42
- b.third_benefit "css=.column.rightmid h3"
43
- b.fourth_benefit "css=.column.rightmost h3"
40
+ benefits do
41
+ first_benefit "css=.column.leftmost h3"
42
+ second_benefir "css=.column.leftmid h3"
43
+ third_benefit "css=.column.rightmid h3"
44
+ fourth_benefit "css=.column.rightmost h3"
44
45
  end
45
46
  end
46
47
  ```
@@ -62,7 +63,8 @@ end
62
63
  ```
63
64
 
64
65
  ### This is just a sneak peek of what Wombat can do. For the complete documentation, please check the [project Wiki](http://github.com/felipecsl/wombat/wiki).
65
- ### [API Documentation](http://rubydoc.info/gems/wombat/0.5.0/frames).
66
+ ### [API Documentation](http://rubydoc.info/gems/wombat/1.0.0/frames)
67
+ ### [Changelog](https://github.com/felipecsl/wombat/wiki/Changelog)
66
68
 
67
69
 
68
70
  ## Contributing to Wombat
@@ -81,25 +83,6 @@ end
81
83
  * Daniel Naves de Carvalho ([@danielnc](https://github.com/danielnc))
82
84
  * [@sigi](https://github.com/sigi)
83
85
 
84
- ## Changelog
85
-
86
- ### version 1.0.0
87
-
88
- * Breaking change: Metadata#format renamed to Metadata#document_format due to method name clash with [Kernel#format](http://www.ruby-doc.org/core-1.9.3/Kernel.html#method-i-format)
89
-
90
- ### version 0.5.0
91
-
92
- * [Fixed a bug on malformed selectors](https://github.com/felipecsl/wombat/commit/e0f4eec20e1e2bb07a1813a1edd019933edeceaa)
93
- * [Fixed a bug where multiple calls to #crawl would not clean up previously iterated array results and yield repeated results](https://github.com/felipecsl/wombat/commit/40b09a5bf8b9ba08aa51b6f41f706b7c3c4e4252)
94
-
95
- ### version 0.4.0
96
-
97
- * Added utility method ``Wombat.crawl`` that eliminates the need to have a ruby class instance to use Wombat. Now you can use just ``Wombat.crawl`` and start working. The class based format still works as before though.
98
-
99
- ### version 0.3.1
100
-
101
- * Added the ability to provide a block to Crawler#crawl and override the default crawler properties for a one off run (thanks to @danielnc)
102
-
103
86
  ## Copyright
104
87
 
105
88
  Copyright (c) 2012 Felipe Lima. See LICENSE.txt for further details.
data/Rakefile CHANGED
@@ -12,7 +12,7 @@ Jeweler::Tasks.new do |gem|
12
12
  gem.name = "wombat"
13
13
  gem.homepage = "http://github.com/felipecsl/wombat"
14
14
  gem.license = "MIT"
15
- gem.summary = %Q{Ruby DSL to crawl web pages}
15
+ gem.summary = %Q{Ruby DSL to scrape web pages}
16
16
  gem.description = %Q{Generic Web crawler with a DSL that parses structured data from web pages}
17
17
  gem.email = "felipe.lima@gmail.com"
18
18
  gem.authors = ["Felipe Lima"]
data/VERSION CHANGED
@@ -1 +1 @@
1
- 1.0.0
1
+ 2.0.0