RubyGems - rubyretriever - Versions diffs - 0.0.11 → 0.0.12 - Mend

rubyretriever 0.0.11 → 0.0.12

Files changed (5) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: ffb93b0faa77d73f014f67be6dbb6320233a5497
-  data.tar.gz: 920547b074b92a01b164e2f27130010773a55e0b
+  metadata.gz: efc429906131b363741d6560e37cb095f905b48e
+  data.tar.gz: 85f320d55600f007315941b6c3213c8f04b70515
 SHA512:
-  metadata.gz: b3c36ff313a381ec3d1950abf1c148faed90aa99a0658741ab4533f15d6b6afd2e6dc95caa0be5afd231125099126c24aeee36b3a873c33d8a81c6f42dace510
-  data.tar.gz: 31bb5aa05f6354f083fae15b3351059d28073c449125081a2c043b7087340d48294541a11951cb2f13f67ec9b8944a029071bcf24e1421b358f64ad458a31d85
+  metadata.gz: 1cdeb51c607ee23b662128ae7b1071085314c9c04626fdfaf708ef9be7224e1bd83652e9bffb64175da480f7830af223a6e8a2a846cb429af3a4c58a71472941
+  data.tar.gz: 437ee738e18d69600897512e0dd047da23166b2c59ad5f70ae8336532ecfa73e85399e9092b5d7b11895ddc23cffd46dd9465c2c6859bbf0890c80a32e15218b

data/lib/retriever.rb CHANGED Viewed

@@ -10,7 +10,6 @@ require 'em-synchrony/fiber_iterator'
 require 'ruby-progressbar'
 require 'open-uri'
 require 'optparse'
-require 'uri'
 require 'csv'
 require 'bloomfilter-rb'

data/lib/retriever/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Retriever
-  VERSION = '0.0.11'
+  VERSION = '0.0.12'
 end

data/readme.md CHANGED Viewed

@@ -1,37 +1,23 @@
-RubyRetriever  [![Gem Version](https://badge.fury.io/rb/rubyretriever.svg)](http://badge.fury.io/rb/rubyretriever)
+[RubyRetriever] (http://www.softwarebyjoe.com/rubyretriever/)  [![Gem Version](https://badge.fury.io/rb/rubyretriever.svg)](http://badge.fury.io/rb/rubyretriever)
 ==============
-Now an official RubyGem!
-```sh
-gem install rubyretriever
-```
-Update (5/26):
-Version 0.0.10 - fixes a bug that wouldn't allow sitemaps to write out to file correctly.
-Update (5/25):
- Version 0.0.6 - Switches to using a Bloom Filter to keep track of past 'visited pages'. I saw this in [Arachnid] (https://github.com/dchuk/Arachnid) and realized it's a much better idea for performance and implemented it immediately. Hat tip [dchuk] (https://github.com/dchuk/)
-About
-=====
+By Joe Norton
 RubyRetriever is a Web Crawler, Site Mapper, File Harvester & Autodownloader, and all around nice buddy to have around.
-Soon to add some high level scraping options.
 RubyRetriever uses aynchronous HTTP requests, thanks to eventmachine and Synchrony fibers, to crawl webpages *very quickly*.
-This is the 2nd or 3rd reincarnation of the RubyRetriever autodownloader project. It started out as a executable autodownloader, intended for malware research. From there it has morphed to become a more well-rounded web-crawler and general purpose file harvesting utility.
-RubyRetriever does NOT respect robots.txt, and RubyRetriever currently - by default - launches up to 10 parallel GET requests at once. This is a feature, do not abuse it. Use at own risk.
+RubyRetriever does NOT respect robots.txt, and RubyRetriever currently - by default - launches up to 10 parallel GET requests at once. This is a feature, do not abuse it. Use at own risk.
-HOW IT WORKS
+getting started
 -----------
+Install the gem
 ```sh
-gem install rubyretriever
-rr [MODE] [OPTIONS] Target_URL
+gem install rubyretriever
 ```
- **Site Mapper**
+ **Example: Sitemap mode**
 ```sh
 rr --sitemap --progress --limit 1000 --output cnet http://www.cnet.com
 ```
@@ -42,7 +28,7 @@ rr -s -p -l 1000 -o cnet http://www.cnet.com
 This would go to http://www.cnet.com and map it until it crawled a max of 1,000 pages, and then it would write it out to a csv named cnet.
- **File Harvesting**
+ **Example: File Harvesting mode**
 ```sh
 rr --files --ext pdf --progress --limit 1000 --output hubspot http://www.hubspot.com
 ```

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rubyretriever
 version: !ruby/object:Gem::Version
-  version: 0.0.11
+  version: 0.0.12
 platform: ruby
 authors:
 - Joe Norton
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-05-25 00:00:00.000000000 Z
+date: 2014-05-26 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: em-synchrony
@@ -126,7 +126,7 @@ files:
 - readme.md
 - spec/retriever_spec.rb
 - spec/spec_helper.rb
-homepage: http://github.com/joenorton/rubyretriever
+homepage: http://www.softwarebyjoe.com/rubyretriever/
 licenses:
 - MIT
 metadata: {}