rubyretriever 1.2.2 → 1.2.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cc417c965402019ab69f33f3a1dfea8b061f6f8c
4
- data.tar.gz: d70ddf42a2d7a845239119ba7563f51bd11d1fb1
3
+ metadata.gz: 3bb32aa2e9c8317d2f3cb13572e2cdecb1da24a9
4
+ data.tar.gz: 732e5610104345efed80651929cb9a050e01d9be
5
5
  SHA512:
6
- metadata.gz: 7e02bd67a9b355c7e23423bfd4e8a4b045d6f30be94ffa8b3fedec9d9a58385d29c134d7c429221b157fed75a59351d9584b066ff2bf27d6b940ac4195e4fda9
7
- data.tar.gz: 21a3318dfe7eb85bddc6a07dfe9dfccbcd7903170427aea87d346dce36b5bda7c9cc54b64951d5835b5efd8dd3ad5ce44529a8881fbba93d9a78ff2de2289eaa
6
+ metadata.gz: 3d4e109785452db3906dc7b66158846cda24e4c3e1b942f600918338e141d6a337f1f9b3087b94b2561c64095fcdc2f2fb439d29b73574a2ddae501a8f0d965b
7
+ data.tar.gz: 2e0befea22dfc2bc689d15ad3c33efaf015f7b1ee5c53a322cccca7f6394a4def445e362585600e1934f770377ce193c5a762caa3639ccda480f7c481ce64d64
@@ -91,7 +91,7 @@ module Retriever
91
91
  @sitemap = options['sitemap']
92
92
  @seo = options['seo']
93
93
  @autodown = options['autodown']
94
- @file_re = Regexp.new(".#{@fileharvest}\z").freeze if @fileharvest
94
+ @file_re = Regexp.new(/.#{@fileharvest}\z/).freeze if @fileharvest
95
95
  end
96
96
 
97
97
  def setup_bloom_filter
@@ -6,7 +6,7 @@ module Retriever
6
6
  def initialize(url, options)
7
7
  super
8
8
  temp_file_collection = @page_one.parse_files(@page_one.parse_internal)
9
- @data.concat(tempFileCollection) if temp_file_collection.size > 0
9
+ @data.concat(temp_file_collection) if temp_file_collection.size > 0
10
10
  lg("#{@data.size} new files found")
11
11
 
12
12
  async_crawl_and_collect
@@ -1,4 +1,4 @@
1
1
  #
2
2
  module Retriever
3
- VERSION = '1.2.2'
3
+ VERSION = '1.2.3'
4
4
  end
data/readme.md CHANGED
@@ -6,7 +6,7 @@ By Joe Norton
6
6
 
7
7
  RubyRetriever is a Web Crawler, Site Mapper, File Harvester & Autodownloader.
8
8
 
9
- RubyRetriever (RR) uses asynchronous HTTP requests, thanks to [Eventmachine](https://github.com/eventmachine/eventmachine) & [Synchrony](https://github.com/igrigorik/em-synchrony), to crawl webpages *very quickly*. Another neat thing about RR, is it uses a ruby implementation of the [bloomfilter](https://github.com/igrigorik/bloomfilter-rb) in order to keep track of page's it has already crawled.
9
+ RubyRetriever (RR) uses asynchronous HTTP requests, thanks to [Eventmachine](https://github.com/eventmachine/eventmachine) & [Synchrony](https://github.com/igrigorik/em-synchrony), to crawl webpages *very quickly*. Another neat thing about RR, is it uses a ruby implementation of the [bloomfilter](https://github.com/igrigorik/bloomfilter-rb) in order to keep track of pages it has already crawled.
10
10
 
11
11
  **v1.0 Update (6/07/2014)** - Includes major code changes, a lot of bug fixes. Much better in dealing with redirects, and issues with the host changing, etc. Also, added the SEO mode -- which grabs a number of key SEO components from every page on a site. Lastly, this update was so extensive that I could not ensure backward compatibility -- and thus, this was update 1.0!
12
12
  mission
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rubyretriever
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.2
4
+ version: 1.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Joe Norton