RubyGems - andyverprauskus-scrubyt - Versions diffs - 0.5.1 - Mend

andyverprauskus-scrubyt 0.5.1

Files changed (45) hide show

data/CHANGELOG +355 -0
data/COPYING +340 -0
data/README.rdoc +121 -0
data/Rakefile +101 -0
data/lib/scrubyt.rb +53 -0
data/lib/scrubyt/core/navigation/agents/firewatir.rb +318 -0
data/lib/scrubyt/core/navigation/agents/mechanize.rb +312 -0
data/lib/scrubyt/core/navigation/fetch_action.rb +63 -0
data/lib/scrubyt/core/navigation/navigation_actions.rb +107 -0
data/lib/scrubyt/core/scraping/compound_example.rb +30 -0
data/lib/scrubyt/core/scraping/constraint.rb +169 -0
data/lib/scrubyt/core/scraping/constraint_adder.rb +49 -0
data/lib/scrubyt/core/scraping/filters/attribute_filter.rb +14 -0
data/lib/scrubyt/core/scraping/filters/base_filter.rb +112 -0
data/lib/scrubyt/core/scraping/filters/constant_filter.rb +9 -0
data/lib/scrubyt/core/scraping/filters/detail_page_filter.rb +37 -0
data/lib/scrubyt/core/scraping/filters/download_filter.rb +64 -0
data/lib/scrubyt/core/scraping/filters/html_subtree_filter.rb +9 -0
data/lib/scrubyt/core/scraping/filters/regexp_filter.rb +13 -0
data/lib/scrubyt/core/scraping/filters/script_filter.rb +11 -0
data/lib/scrubyt/core/scraping/filters/text_filter.rb +34 -0
data/lib/scrubyt/core/scraping/filters/tree_filter.rb +138 -0
data/lib/scrubyt/core/scraping/pattern.rb +359 -0
data/lib/scrubyt/core/scraping/pre_filter_document.rb +14 -0
data/lib/scrubyt/core/scraping/result_indexer.rb +90 -0
data/lib/scrubyt/core/shared/extractor.rb +183 -0
data/lib/scrubyt/logging.rb +154 -0
data/lib/scrubyt/output/post_processor.rb +139 -0
data/lib/scrubyt/output/result.rb +44 -0
data/lib/scrubyt/output/result_dumper.rb +154 -0
data/lib/scrubyt/output/result_node.rb +145 -0
data/lib/scrubyt/output/scrubyt_result.rb +42 -0
data/lib/scrubyt/utils/compound_example_lookup.rb +50 -0
data/lib/scrubyt/utils/ruby_extensions.rb +85 -0
data/lib/scrubyt/utils/shared_utils.rb +58 -0
data/lib/scrubyt/utils/simple_example_lookup.rb +40 -0
data/lib/scrubyt/utils/xpathutils.rb +202 -0
data/test/blackbox_test.rb +60 -0
data/test/blackbox_tests/basic/multi_root.rb +6 -0
data/test/blackbox_tests/basic/simple.rb +5 -0
data/test/blackbox_tests/detail_page/one_detail_page.rb +9 -0
data/test/blackbox_tests/detail_page/two_detail_pages.rb +9 -0
data/test/blackbox_tests/next_page/next_page_link.rb +7 -0
data/test/blackbox_tests/next_page/page_list_links.rb +7 -0
metadata +120 -0

data/README.rdoc ADDED Viewed

@@ -0,0 +1,121 @@
+= scRUBYt! - Hpricot and Mechanize (or FireWatir)  on steroids
+A simple to learn and use, yet very powerful web extraction framework written in Ruby. Navigate through the Web,
+Extract, query, transform and save relevant data from the Web page of your interest by the concise and easy to use DSL.
+Do you think that Mechanize and Hpricot are powerful libraries? You're right, they are, indeed - hats off to their
+authors: without these libs scRUBYt! could not exist now! I have been wondering whether their functionality could be
+still enhanced further - so I took these two powerful ingredients, threw in a handful of smart heuristics, wrapped them
+around with a chunky DSL coating and sprinkled the whole stuff with a lots of convention over configuration(tm) goodies
+- and ... enter scRUBYt! and decide it yourself.
+= Wait... why do we need one more web-scraping toolkit?
+After all, we have HPricot, and Rubyful-soup, and Mechanize, and scrAPI, and ARIEL and scrapes and ...
+Well, because scRUBYt! is different. It has an entirely different philosophy, underlying techniques, theoretical
+background, use cases, todo list, real-life scenarios etc.  - shortly it should be used in different situations with
+different requirements than the previosly mentioned ones.
+If you need something quick and/or would like to have maximal control over the scraping process, I recommend HPricot.
+Mechanize shines when it comes to interaction with Web pages. Since scRUBYt! is operating based on XPaths, sometimes you
+will chose scrAPI because CSS selectors will better suit your needs. The list goes on and on, boiling down to the good
+old mantra: use the right tool for the right job!
+I hope there will be also times when you will want to experiment with Pandora's box and reach after the power of
+scRUBYt! :-)
+= Sounds fine - show me an example!
+Let's apply the "show don't tell" principle. Okay, here we go:
+<tt>ebay_data = Scrubyt::Extractor.define do</tt>
+  fetch 'http://www.ebay.com/'
+  fill_textfield 'satitle', 'ipod'
+  submit
+  click_link 'Apple iPod'
+  record do
+    item_name 'APPLE NEW IPOD MINI 6GB MP3 PLAYER SILVER'
+    price '$71.99'
+  end
+  next_page 'Next >', :limit => 5
+<tt>end</tt>
+output:
+<tt><root></tt>
+    <record>
+      <item_name>APPLE IPOD NANO 4GB - PINK - MP3 PLAYER</item_name>
+      <price>$149.95</price>
+    </record>
+    <record>
+      <item_name>APPLE IPOD 30GB BLACK VIDEO/PHOTO/MP3 PLAYER</item_name>
+      <price>$172.50</price>
+    </record>
+    <record>
+      <item_name>NEW APPLE IPOD NANO 4GB PINK MP3 PLAYER</item_name>
+      <price>$171.06</price>
+    </record>
+    <!-- another 200+ results -->
+<tt></root></tt>
+This was a relatively beginner-level example (scRUBYt knows a lot more than this and there are much complicated
+extractors than the above one) - yet it did a lot of things automagically. First of all,
+it automatically loaded the page of interest (by going to ebay.com, automatically searching for ipods
+and narrowing down the results by clicking on 'Apple iPod'), then it extracted *all* the items that
+looked like the specified example (which btw described also how the output structure should look like) - on the first 5
+result pages. Not so bad for about 10 lines of code, eh?
+= OK, OK, I believe you, what should I do?
+You can find everything you will need at these addresses (or if not, I doubt you will find it elsewhere...). See the
+next section about installation, and after installing be sure to check out these URLs:
+* <a href='http://www.rubyrailways.com'>rubyrailways.com</a> - for some theory; if you would like to take a sneak peek
+at web scraping in general and/or you would like to understand what's going on under the hood, check out <a
+href='http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails'>this article about
+web-scraping</a>!
+* <a href='http://scrubyt.org'>http://scrubyt.org</a> - your source of tutorials, howtos, news etc.
+* <a href='http://scrubyt.rubyforge.org'>scrubyt.rubyforge.org</a> - for an up-to-date, online Rdoc
+* <a href='http://projects.rubyforge.org/scrubyt'>projects.rubyforge.org/scrubyt</a> - for developer info, including
+open and closed bugs, files etc.
+* projects.rubyforge.org/scrubyt/files... - fair amount (and still growing with every release) of examples, showcasing
+the features of scRUBYt!
+* planned: public extractor repository - hopefully (after people realize how great this package is :-)) scRUBYt! will
+have a community, and people will upload their extractors for whatever reason
+If you still can't find something here, drop a mail to the guys at scrubyt@/NO-SPAM/scrubyt.org!
+= How to install
+scRUBYt! requires these packages to be installed:
+* Ruby 1.8.4
+* Hpricot 0.5
+* Mechanize 0.6.3
+I assume you have ruby any rubygems installed. To install WWW::Mechanize 0.6.3 or higher, just run
+<tt>sudo gem install mechanize</tt>
+Hpricot 0.5 is just hot off the frying pan - perfect timing, _why! - install it with
+<tt>sudo gem install hpricot</tt>
+Once all the dependencies (Mechanize and Hpricot) are up and running, you can install scrubyt with
+<tt>sudo gem install scrubyt</tt>
+If you encounter any problems, drop a mail to the guys at scrubyt@/NO-SPAM/scrubyt.org!
+= Author
+Copyright (c) 2006 by Peter Szinek (peter@/NO-SPAM/rubyrailways.com)
+= Copyright
+This library is distributed under the GPL.  Please see the LICENSE file.

data/Rakefile ADDED Viewed

@@ -0,0 +1,101 @@
+require 'rake/rdoctask'
+require 'rake/testtask'
+require 'rake/gempackagetask'
+require 'rake/packagetask'
+###################################################
+# Dependencies
+###################################################
+task "default" => ["test_all"]
+task "generate_rdoc" => ["cleanup_readme"]
+task "cleanup_readme" => ["rdoc"]
+###################################################
+# Gem specification
+###################################################
+gem_spec = Gem::Specification.new do |s|
+  s.name = 'scrubyt'
+  s.version = '0.4.30'
+  s.summary = 'A powerful Web-scraping framework built on Mechanize and Hpricot (and FireWatir)'
+  s.description = %{scRUBYt! is an easy to learn and use, yet powerful and effective web scraping framework. It's most interesting part is a Web-scraping DSL built on HPricot and WWW::Mechanize, which allows to navigate to the page of interest, then extract and query data records with a few lines of code. It is hard to describe scRUBYt! in a few sentences - you have to see it for yourself!}
+  # Files containing Test::Unit test cases.
+  s.test_files = FileList['test/unittests/**/*']
+  # List of other files to be included.
+  s.files = FileList['COPYING', 'README.rdoc', 'CHANGELOG', 'Rakefile', 'lib/**/*.rb']
+  s.author = 'Peter Szinek'
+  s.email = 'peter@rubyrailways.com'
+  s.homepage = 'http://www.scrubyt.org'
+  s.add_dependency('hpricot', '>= 0.5')
+  s.add_dependency('mechanize', '>= 0.6.3')
+  s.has_rdoc = 'true'
+end
+###################################################
+# Tasks
+###################################################
+Rake::RDocTask.new do |generate_rdoc|
+     files = ['lib/**/*.rb', 'README.rdoc', 'CHANGELOG']
+     generate_rdoc.rdoc_files.add(files)
+     generate_rdoc.main = "README.rdoc" # page to start on
+     generate_rdoc.title = "Scrubyt Documentation"
+     generate_rdoc.template = "resources/allison/allison.rb"
+     generate_rdoc.rdoc_dir = 'doc' # rdoc output folder
+     generate_rdoc.options << '--line-numbers' << '--inline-source'
+end
+Rake::TestTask.new(:test_all) do |task|
+  task.pattern = 'test/*_test.rb'
+end
+Rake::TestTask.new(:test_blackbox) do |task|
+  task.test_files = ['test/blackbox_test.rb']
+end
+task "test_specific" do
+  ruby "test/blackbox_test.rb #{ARGV[1]}"
+end
+Rake::TestTask.new(:test_non_blackbox) do |task|
+  task.test_files = FileList['test/*_test.rb'] - ['test/blackbox_test.rb']
+end
+task "rcov" do
+  sh 'rcov --xrefs test/*.rb'
+  puts 'Report done.'
+end
+task "cleanup_readme" do
+  puts "Cleaning up README..."
+  readme_in = open('./doc/files/README.html')
+  content = readme_in.read
+  content.sub!('<h1 id="item_name">File: README</h1>','')
+  content.sub!('<h1>Description</h1>','')
+  readme_in.close
+  open('./doc/files/README.html', 'w') {|f| f.write(content)}
+  #OK, this is uggly as hell and as non-DRY as possible, but
+  #I don't have time to deal with it right now
+  puts "Cleaning up CHANGELOG..."
+  readme_in = open('./doc/files/CHANGELOG.html')
+  content = readme_in.read
+  content.sub!('<h1 id="item_name">File: CHANGELOG</h1>','')
+  content.sub!('<h1>Description</h1>','')
+  readme_in.close
+  open('./doc/files/CHANGELOG.html', 'w') {|f| f.write(content)}
+end
+task "generate_rdoc" do
+end
+Rake::GemPackageTask.new(gem_spec) do |pkg|
+  pkg.need_zip = false
+  pkg.need_tar = false
+end
+#Rake::PackageTask.new('scrubyt-examples', '0.4.03') do |pkg|
+#  pkg.need_zip = true
+#  pkg.need_tar = true
+#  pkg.package_files.include("examples/**/*")
+#end

data/lib/scrubyt.rb ADDED Viewed

@@ -0,0 +1,53 @@
+if RUBY_VERSION < '1.9'
+  $KCODE = "u"
+  require "jcode"
+end
+#ruby core
+require "open-uri"
+require "erb"
+#gems
+require "rexml/text"
+require "rubygems"
+require "mechanize"
+require "hpricot"
+#scrubyt
+require "#{File.dirname(__FILE__)}/scrubyt/logging"
+require "#{File.dirname(__FILE__)}/scrubyt/utils/ruby_extensions.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/utils/xpathutils.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/utils/shared_utils.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/utils/simple_example_lookup.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/utils/compound_example_lookup.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/constraint_adder.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/constraint.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/result_indexer.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/pre_filter_document.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/compound_example.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/output/result_node.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/output/scrubyt_result.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/navigation/agents/mechanize.rb"
+# -- Making Firewatir optional --
+begin
+  require "#{File.dirname(__FILE__)}/scrubyt/core/navigation/agents/firewatir.rb"
+rescue LoadError
+  puts "The gem firewatir is not installed, you'll be able to use Mechanize as the agent only"
+end
+# --
+require "#{File.dirname(__FILE__)}/scrubyt/core/navigation/navigation_actions.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/navigation/fetch_action.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/shared/extractor.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/base_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/attribute_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/constant_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/script_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/text_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/detail_page_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/download_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/html_subtree_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/regexp_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/filters/tree_filter.rb"
+require "#{File.dirname(__FILE__)}/scrubyt/core/scraping/pattern.rb"

data/lib/scrubyt/core/navigation/agents/firewatir.rb ADDED Viewed

@@ -0,0 +1,318 @@
+require 'firewatir'
+module Scrubyt
+  ##
+  #=<tt>Fetching pages (and related functionality)</tt>
+  #
+  #Since lot of things are happening during (and before)
+  #the fetching of a document, I decided to move out fetching related
+  #functionality to a separate class - so if you are looking for anything
+  #which is loading a document (even by submitting a form or clicking a link)
+  #and related things like setting a proxy etc. you should find it here.
+  module Navigation
+    module Firewatir
+      def self.included(base)
+        base.module_eval do
+          @@agent = FireWatir::Firefox.new unless defined? @@agent
+          @@current_doc_url = nil
+          @@current_doc_protocol = nil
+          @@base_dir = nil
+          @@host_name = nil
+          @@history = []
+          @@current_form = nil
+          @@current_frame = nil
+          ##
+          #Action to fetch a document (either a file or a http address)
+          #
+          #*parameters*
+          #
+          #_doc_url_ - the url or file name to fetch
+          def self.fetch(doc_url, *args)
+            #Refactor this crap!!! with option_accessor stuff
+            if args.size > 0
+              mechanize_doc = args[0][:mechanize_doc]
+              resolve = args[0][:resolve]
+              basic_auth = args[0][:basic_auth]
+              #Refactor this whole stuff as well!!! It looks awful...
+              parse_and_set_basic_auth(basic_auth) if basic_auth
+            else
+              mechanize_doc = nil
+              resolve = :full
+            end
+            @@current_doc_url = doc_url
+            @@current_doc_protocol = determine_protocol
+            if mechanize_doc.nil?
+              handle_relative_path(doc_url) unless @@current_doc_protocol == 'xpath'
+              handle_relative_url(doc_url, resolve)
+              Scrubyt.log :ACTION, "fetching document: #{@@current_doc_url}"
+              case @@current_doc_protocol
+                when 'file'
+		  @@agent.goto("file://"+ @@current_doc_url)
+                else
+		  @@agent.goto(@@current_doc_url)
+              end
+              @@mechanize_doc = "<html>#{@@agent.html}</html>"
+            else
+              @@mechanize_doc = mechanize_doc
+            end
+            @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+            store_host_name(@@agent.url)   # in case we're on a new host
+          end
+          def self.use_current_page
+            @@mechanize_doc = "<html>#{@@agent.html}</html>"
+            @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+          end
+          def self.frame(attribute, value)
+            if @@current_frame
+              @@current_frame.frame(attribute, value)
+            else
+              @@current_frame = @@agent.frame(attribute, value)
+            end
+          end
+          ##
+          #Submit the last form;
+          def self.submit(current_form, sleep_time=nil, button=nil, type=nil)
+            if @@current_frame
+              #BRUTAL hax but FW is such a shitty piece of software
+              #this sucks FAIL omg
+              @@current_frame.locate
+              form = Document.new(@@current_frame).all.find{|t| t.tagName=="FORM"}
+              form.submit
+            else
+              @@agent.element_by_xpath(@@current_form).submit
+            end
+            if sleep_time
+              sleep sleep_time
+              @@agent.wait
+            end
+            @@current_doc_url = @@agent.url
+            @@mechanize_doc = "<html>#{@@agent.html}</html>"
+            @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+          end
+          ##
+          #Click the link specified by the text
+          def self.click_link(link_spec,index = 0,wait_secs=0)
+            Scrubyt.log :ACTION, "Clicking link specified by: %p" % link_spec
+            if link_spec.is_a?(Hash)
+              elem = XPathUtils.generate_XPath(CompoundExampleLookup.find_node_from_compund_example(@@hpricot_doc, link_spec, false, index), nil, true)
+              result_page = @@agent.element_by_xpath(elem).click
+            else
+              @@agent.link(:innerHTML, Regexp.escape(link_spec)).click
+            end
+            sleep(wait_secs) if wait_secs > 0
+            @@agent.wait
+            # evaluate the results
+            extractor.evaluate_extractor
+            @@current_doc_url = @@agent.url
+            @@mechanize_doc = "<html>#{@@agent.html}</html>"
+            @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+            Scrubyt.log :ACTION, "Fetching #{@@current_doc_url}"
+          end
+          def self.click_by_xpath_if_exists(xpath, wait_secs=0)
+            begin
+              result_page = @@agent.element_by_xpath(xpath).click
+              sleep(wait_secs) if wait_secs > 0
+              @@agent.wait
+              extractor.evaluate_extractor
+              @@current_doc_url = @@agent.url
+              @@mechanize_doc = "<html>#{@@agent.html}</html>"
+              @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+              Scrubyt.log :ACTION, "Fetching #{@@current_doc_url}"
+            rescue Watir::Exception::UnknownObjectException
+              Scrubyt.log :INFO, "XPath #{xpath} doesn't exist in this document"
+            end
+          end
+	  def self.click_by_xpath_without_evaluate(xpath, wait_secs=0)
+		  Scrubyt.log :ACTION, "Clicking by XPath : %p" % xpath
+		  @@agent.element_by_xpath(xpath).click
+		  Scrubyt.log :INFO, "sleeping #{wait_secs}..."
+		  sleep(wait_secs) if wait_secs > 0
+		  @@agent.wait
+		  # does not call evaluate_extractor
+		  #extractor.evaluate_extractor
+		  @@current_doc_url = @@agent.url
+		  @@mechanize_doc = "<html>#{@@agent.html}</html>"
+		  @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+		  Scrubyt.log :ACTION, "Fetching #{@@current_doc_url}"
+	  end
+          def self.click_by_xpath(xpath, wait_secs=0)
+            Scrubyt.log :ACTION, "Clicking by XPath : %p" % xpath
+            @@agent.element_by_xpath(xpath).click
+            Scrubyt.log :INFO, "sleeping #{wait_secs}..."
+            sleep(wait_secs) if wait_secs > 0
+            @@agent.wait
+            # evaluate the results
+            extractor.evaluate_extractor
+            @@current_doc_url = @@agent.url
+            @@mechanize_doc = "<html>#{@@agent.html}</html>"
+            @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+            Scrubyt.log :ACTION, "Fetching #{@@current_doc_url}"
+          end
+          def self.click_image_map(index = 0)
+            Scrubyt.log :ACTION, "Clicking image map at index: %p" % index
+            uri = @@mechanize_doc.search("//area")[index]['href']
+            result_page = @@agent.get(uri)
+            @@current_doc_url = result_page.uri.to_s
+            Scrubyt.log :ACTION, "Fetching #{@@current_doc_url}"
+            fetch(@@current_doc_url, :mechanize_doc => result_page)
+          end
+          def self.store_host_name(doc_url)
+            @@host_name = doc_url.match(/.*\..*?\//)[0] if doc_url.match(/.*\..*?\//)
+            @@original_host_name ||= @@host_name
+          end #end of method store_host_name
+          def self.determine_protocol
+            old_protocol = @@current_doc_protocol
+            new_protocol = case @@current_doc_url
+              when /^\/\//
+                'xpath'
+              when /^https/
+                'https'
+              when /^http/
+                'http'
+              when /^www\./
+                'http'
+              else
+                'file'
+              end
+            return 'http' if ((old_protocol == 'http') && new_protocol == 'file')
+            return 'https' if ((old_protocol == 'https') && new_protocol == 'file')
+            new_protocol
+          end
+          def self.parse_and_set_basic_auth(basic_auth)
+            login, pass = basic_auth.split('@')
+            Scrubyt.log :ACTION, "Basic authentication: login=<#{login}>, pass=<#{pass}>"
+            @@agent.basic_auth(login, pass)
+          end
+          def self.handle_relative_path(doc_url)
+            if @@base_dir == nil || doc_url[0..0] == "/"
+              @@base_dir = doc_url.scan(/.+\//)[0] if @@current_doc_protocol == 'file'
+            else
+              @@current_doc_url = ((@@base_dir + doc_url) if doc_url !~ /#{@@base_dir}/)
+            end
+          end
+          def self.handle_relative_url(doc_url, resolve)
+            return if doc_url =~ /^(http:|javascript:)/
+            if doc_url !~ /^\//
+              first_char = doc_url[0..0]
+              doc_url = ( first_char == '?'  ? '' : '/'  ) + doc_url
+              if first_char == '?' #This is an ugly hack... really have to throw this shit out and go with mechanize's
+                current_uri = @@mechanize_doc.uri.to_s
+                current_uri = @@agent.history.first.uri.to_s if current_uri =~ /\/popup\//
+                if (current_uri.include? '?')
+                  current_uri = current_uri.scan(/.+\//)[0]
+                else
+                  current_uri += '/' unless current_uri[-1..-1] == '/'
+                end
+                @@current_doc_url = current_uri + doc_url
+                return
+              end
+            end
+            case resolve
+              when :full
+                @@current_doc_url = (@@host_name + doc_url) if ( @@host_name != nil && (doc_url !~ /#{@@host_name}/))
+                @@current_doc_url = @@current_doc_url.split('/').uniq.join('/')
+              when :host
+                base_host_name = (@@host_name.count("/") == 2 ? @@host_name : @@host_name.scan(/(http.+?\/\/.+?)\//)[0][0])
+                @@current_doc_url = base_host_name + doc_url
+              else
+                #custom resilving
+                @@current_doc_url = resolve + doc_url
+            end
+          end
+          def self.fill_textfield(textfield_name, query_string, wait_secs, useValue)
+            @@current_form = "//input[@name='#{textfield_name}']/ancestor::form"
+            target = @@current_frame || @@agent
+            if useValue
+              target.text_field(:name,textfield_name).value = query_string
+            else
+              target.text_field(:name,textfield_name).set(query_string)
+            end
+            sleep(wait_secs) if wait_secs > 0
+            @@mechanize_doc = "<html>#{@@agent.html}</html>"
+            @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc))
+          end
+          ##
+          #Action to fill a textarea with text
+          def self.fill_textarea(textarea_name, text)
+            @@current_form = "//input[@name='#{textarea_name}']/ancestor::form"
+            @@agent.text_field(:name,textarea_name).set(text)
+          end
+          ##
+          #Action for selecting an option from a dropdown box
+          def self.select_option(selectlist_name, option)
+            if selectlist_name.is_a? Hash
+		    select_args = selectlist_name
+		    unless select_args.size == 1
+			    raise "select_option only supports using a name or a hash with a single pair, e.g.: {:id => \"foo1\"}"
+		    end
+		    @@current_form = "//select[@#{select_args.keys.first}='#{select_args.values.first}']/ancestor::form"
+	    else
+		    @@current_form = "//select[@name='#{selectlist_name}']/ancestor::form"
+	    end
+	    list = @@agent.select_list(:name,selectlist_name)
+	    #STDOUT.puts "list = #{list.inspect}"
+	    begin
+		    error = list.select(option)
+	    rescue
+		    list.select_value(option) || error
+	    end
+          end
+          def self.check_checkbox(checkbox_name)
+            @@current_form = "//input[@name='#{checkbox_name}']/ancestor::form"
+            @@agent.checkbox(:name,checkbox_name).set(true)
+          end
+          def self.check_radiobutton(checkbox_name, index=0)
+            @@current_form = "//input[@name='#{checkbox_name}']/ancestor::form"
+            @@agent.elements_by_xpath("//input[@name='#{checkbox_name}']")[index].set
+          end
+          def self.click_image_map(index=0)
+            raise 'NotImplemented'
+          end
+          def self.wait(time=1)
+            sleep(time)
+            @@agent.wait
+          end
+          def self.close_firefox
+            @@agent.close
+          end
+        end
+      end
+    end
+  end
+end