scrubber-scrubyt 0.4.12 → 0.4.13

Sign up to get free protection for your applications and to get access to all the features.
data/README CHANGED
@@ -1,17 +1,29 @@
1
1
  = scRUBYt! - Hpricot and Mechanize (or FireWatir) on steroids
2
2
 
3
- A simple to learn and use, yet very powerful web extraction framework written in Ruby. Navigate through the Web, Extract, query, transform and save relevant data from the Web page of your interest by the concise and easy to use DSL.
3
+ A simple to learn and use, yet very powerful web extraction framework written in Ruby. Navigate through the Web,
4
+ Extract, query, transform and save relevant data from the Web page of your interest by the concise and easy to use DSL.
4
5
 
5
- Do you think that Mechanize and Hpricot are powerful libraries? You're right, they are, indeed - hats off to their authors: without these libs scRUBYt! could not exist now! I have been wondering whether their functionality could be still enhanced further - so I took these two powerful ingredients, threw in a handful of smart heuristics, wrapped them around with a chunky DSL coating and sprinkled the whole stuff with a lots of convention over configuration(tm) goodies - and ... enter scRUBYt! and decide it yourself.
6
+
7
+ Do you think that Mechanize and Hpricot are powerful libraries? You're right, they are, indeed - hats off to their
8
+ authors: without these libs scRUBYt! could not exist now! I have been wondering whether their functionality could be
9
+ still enhanced further - so I took these two powerful ingredients, threw in a handful of smart heuristics, wrapped them
10
+ around with a chunky DSL coating and sprinkled the whole stuff with a lots of convention over configuration(tm) goodies
11
+ - and ... enter scRUBYt! and decide it yourself.
6
12
 
7
13
  = Wait... why do we need one more web-scraping toolkit?
8
14
 
9
15
  After all, we have HPricot, and Rubyful-soup, and Mechanize, and scrAPI, and ARIEL and scrapes and ...
10
- Well, because scRUBYt! is different. It has an entirely different philosophy, underlying techniques, theoretical background, use cases, todo list, real-life scenarios etc. - shortly it should be used in different situations with different requirements than the previosly mentioned ones.
16
+ Well, because scRUBYt! is different. It has an entirely different philosophy, underlying techniques, theoretical
17
+ background, use cases, todo list, real-life scenarios etc. - shortly it should be used in different situations with
18
+ different requirements than the previosly mentioned ones.
11
19
 
12
- If you need something quick and/or would like to have maximal control over the scraping process, I recommend HPricot. Mechanize shines when it comes to interaction with Web pages. Since scRUBYt! is operating based on XPaths, sometimes you will chose scrAPI because CSS selectors will better suit your needs. The list goes on and on, boiling down to the good old mantra: use the right tool for the right job!
20
+ If you need something quick and/or would like to have maximal control over the scraping process, I recommend HPricot.
21
+ Mechanize shines when it comes to interaction with Web pages. Since scRUBYt! is operating based on XPaths, sometimes you
22
+ will chose scrAPI because CSS selectors will better suit your needs. The list goes on and on, boiling down to the good
23
+ old mantra: use the right tool for the right job!
13
24
 
14
- I hope there will be also times when you will want to experiment with Pandora's box and reach after the power of scRUBYt! :-)
25
+ I hope there will be also times when you will want to experiment with Pandora's box and reach after the power of
26
+ scRUBYt! :-)
15
27
 
16
28
  = Sounds fine - show me an example!
17
29
 
@@ -50,21 +62,30 @@ output:
50
62
  <!-- another 200+ results -->
51
63
  <tt></root></tt>
52
64
 
53
- This was a relatively beginner-level example (scRUBYt knows a lot more than this and there are much complicated extractors than the above one) - yet it did a lot of things automagically. First of all,
65
+ This was a relatively beginner-level example (scRUBYt knows a lot more than this and there are much complicated
66
+ extractors than the above one) - yet it did a lot of things automagically. First of all,
54
67
  it automatically loaded the page of interest (by going to ebay.com, automatically searching for ipods
55
68
  and narrowing down the results by clicking on 'Apple iPod'), then it extracted *all* the items that
56
- looked like the specified example (which btw described also how the output structure should look like) - on the first 5 result pages. Not so bad for about 10 lines of code, eh?
69
+ looked like the specified example (which btw described also how the output structure should look like) - on the first 5
70
+ result pages. Not so bad for about 10 lines of code, eh?
57
71
 
58
72
  = OK, OK, I believe you, what should I do?
59
73
 
60
- You can find everything you will need at these addresses (or if not, I doubt you will find it elsewhere...). See the next section about installation, and after installing be sure to check out these URLs:
74
+ You can find everything you will need at these addresses (or if not, I doubt you will find it elsewhere...). See the
75
+ next section about installation, and after installing be sure to check out these URLs:
61
76
 
62
- * <a href='http://www.rubyrailways.com'>rubyrailways.com</a> - for some theory; if you would like to take a sneak peek at web scraping in general and/or you would like to understand what's going on under the hood, check out <a href='http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails'>this article about web-scraping</a>!
77
+ * <a href='http://www.rubyrailways.com'>rubyrailways.com</a> - for some theory; if you would like to take a sneak peek
78
+ at web scraping in general and/or you would like to understand what's going on under the hood, check out <a
79
+ href='http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails'>this article about
80
+ web-scraping</a>!
63
81
  * <a href='http://scrubyt.org'>http://scrubyt.org</a> - your source of tutorials, howtos, news etc.
64
82
  * <a href='http://scrubyt.rubyforge.org'>scrubyt.rubyforge.org</a> - for an up-to-date, online Rdoc
65
- * <a href='http://projects.rubyforge.org/scrubyt'>projects.rubyforge.org/scrubyt</a> - for developer info, including open and closed bugs, files etc.
66
- * projects.rubyforge.org/scrubyt/files... - fair amount (and still growing with every release) of examples, showcasing the features of scRUBYt!
67
- * planned: public extractor repository - hopefully (after people realize how great this package is :-)) scRUBYt! will have a community, and people will upload their extractors for whatever reason
83
+ * <a href='http://projects.rubyforge.org/scrubyt'>projects.rubyforge.org/scrubyt</a> - for developer info, including
84
+ open and closed bugs, files etc.
85
+ * projects.rubyforge.org/scrubyt/files... - fair amount (and still growing with every release) of examples, showcasing
86
+ the features of scRUBYt!
87
+ * planned: public extractor repository - hopefully (after people realize how great this package is :-)) scRUBYt! will
88
+ have a community, and people will upload their extractors for whatever reason
68
89
 
69
90
  If you still can't find something here, drop a mail to the guys at scrubyt@/NO-SPAM/scrubyt.org!
70
91
 
@@ -123,8 +123,7 @@ module Scrubyt
123
123
  @@original_host_name ||= @@host_name
124
124
  end #end of method store_host_name
125
125
 
126
- def self.parse_and_set_proxy(proxy)
127
- proxy = proxy[:proxy]
126
+ def self.parse_and_set_proxy(proxy)
128
127
  if proxy.downcase == 'localhost'
129
128
  @@host = 'localhost'
130
129
  @@port = proxy.split(':').last
data/lib/scrubyt.rb CHANGED
@@ -1,5 +1,7 @@
1
- $KCODE = "u"
2
- require "jcode"
1
+ if RUBY_VERSION < '1.9'
2
+ $KCODE = "u"
3
+ require "jcode"
4
+ end
3
5
 
4
6
  #ruby core
5
7
  require "open-uri"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: scrubber-scrubyt
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.12
4
+ version: 0.4.13
5
5
  platform: ruby
6
6
  authors:
7
7
  - Peter Szinek