RubyGems - recipe_crawler - Versions diffs - 3.1.2 → 4.0.0 - Mend

recipe_crawler 3.1.2 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +5 -5
data/README.md +6 -7
data/bin/recipe_crawler +0 -0
data/lib/recipe_crawler.rb +2 -2
data/lib/recipe_crawler/crawler.rb +153 -168
data/lib/recipe_crawler/version.rb +1 -1
data/recipe_crawler.gemspec +15 -17
metadata +26 -12

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: 3ba3aefb304f9ab25b0939192318bde3184d9ee9
-  data.tar.gz: 7f0f73cb2c3b29498fdc5eb47d62c862970ee813
+SHA256:
+  metadata.gz: d2185c1d31c0fd91ddf2df44770a587a5fa9b4bb3b12106f0dcb1c04b4ad0f94
+  data.tar.gz: 02b21cabf006eb6f6430d2a91f6ee4879077496abd6b4234638cc5c03dff1448
 SHA512:
-  metadata.gz: a4849ca0b5ee1f5d2521d0d77f1308c9c11f56f2247a45d45defc4b12fa6ed9d2cb4a36f8831162221792b9a8d2b48eb57a8bfb9d05e4bcd18953d998f17f50f
-  data.tar.gz: 3124cbe521de9c8e45ea5f11f585600f38c9399d4301de6809e08eb9f9ce4768a0cbde3bfb4cd75e1205559dc3bc7cc4a33be725d871f33328ed394f91ea98d3
+  metadata.gz: e10ff78ee97a4e8bb830275768477cb77cc1441d0dbbcbce8008e18c79f0db85d6e97923140ee7cfb9483b09efe5b806dc2ed878d193723c0e7636a0bf0b989e
+  data.tar.gz: c947a04b528b40d5ab396bcc16d9e7ae8a5e21bc25d5295b339e97c23cb7132f3b7c543cae2f601bb5f1963503a73ba9661ba9cac31b50a307954075112558b4

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # RecipeCrawler
-A **web crawler** to save recipes from [marmiton.org](http://www.marmiton.org/), [750g.com](http://www.750g.com) or [cuisineaz.com](http://www.cuisineaz.com) into an **SQlite3** database.
+A **web crawler** to save recipes from [marmiton.org](http://www.marmiton.org/), [750g.com](http://www.750g.com) or [cuisineaz.com](http://www.cuisineaz.com) into an **SQlite3** database.
 > For the moment, it works only with [cuisineaz.com](http://www.cuisineaz.com)
@@ -29,7 +29,7 @@ Or install it yourself as:
 ### Command line
-Install this gem and run
+Install this gem and run
     $ recipe_crawler -h
     Usage: recipe_crawler [options]
@@ -60,9 +60,9 @@ Then you just need to instanciate a `RecipeCrawler::Crawler` with url of a Cuisi
     url = 'http://www.cuisineaz.com/recettes/pate-a-pizza-legere-55004.aspx'
     r = RecipeCrawler::Crawler.new url
-Then you just need to run the crawl with a limit number of recipe to fetch. All recipes will be saved in a *export.sqlite3* file. You can pass a block to play with `RecipeSraper::Recipe` objects.
+Then you just need to run the crawl with a limit number of recipe to fetch. All recipes will be saved in a *export.sqlite3* file. You can pass a block to play with `RecipeScraper::Recipe` objects.
-    r.crawl!(10) do |recipe|
+    r.crawl!(limit: 10) do |recipe|
         puts recipe.to_hash
         # will return
         # --------------
@@ -91,7 +91,6 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERN
 The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
-Author
-----------
+## Author
-[Rousseau Alexandre](https://github.com/madeindjs)
+[Rousseau Alexandre](https://github.com/madeindjs)

data/bin/recipe_crawler CHANGED Viewed

File without changes

data/lib/recipe_crawler.rb CHANGED Viewed

@@ -1,5 +1,5 @@
-require "recipe_crawler/version"
-require "recipe_crawler/crawler"
+require 'recipe_crawler/version'
+require 'recipe_crawler/crawler'
 module RecipeCrawler
   # Your code goes here...

data/lib/recipe_crawler/crawler.rb CHANGED Viewed

@@ -3,175 +3,160 @@ require 'nokogiri'
 require 'open-uri'
 require 'sqlite3'
 module RecipeCrawler
-	# This is the main class to crawl recipes from a given url
-	#   1. Crawler will crawl url to find others recipes urls on the website
-	#   2. it will crawl urls founded to find other url again & again
-	#   3. it will scrape urls founded to get data
-	#
-	# @attr_reader url [String] first url parsed
-	# @attr_reader host [Symbol] of url's host
-	# @attr_reader scraped_urls [Array<String>] of url's host
-	# @attr_reader crawled_urls [Array<String>] of url's host
-	# @attr_reader to_crawl_urls [Array<String>] of url's host
-	# @attr_reader recipes [Array<RecipeSraper::Recipe>] recipes fetched
-	# @attr_reader db [SQLite3::Database] Sqlite database where recipe will be saved
-	class Crawler
-		# URL than crawler can parse
-		ALLOWED_URLS = {
-			cuisineaz: 'http://www.cuisineaz.com/recettes/',
-			marmiton: 'http://www.marmiton.org/recettes/',
-			g750: 'http://www.750g.com/'
-		}
-		attr_reader :url, :host, :scraped_urls, :crawled_urls, :to_crawl_urls, :recipes
-		attr_accessor :interval_sleep_time
-		#
-		# Create a Crawler
-		# @param url [String] a url a recipe to scrawl other one
-		def initialize url
-			@url = url
-			if url_valid?
-				@recipes = []
-				@crawled_urls = []
-				@scraped_urls = []
-				@to_crawl_urls = []
-				@to_crawl_urls << url
-				@interval_sleep_time = 0
-				@db = SQLite3::Database.new "results.sqlite3"
-				@db.execute "CREATE TABLE IF NOT EXISTS recipes(
-					Id INTEGER PRIMARY KEY,
-					title TEXT,
-					preptime INTEGER,
-					cooktime INTEGER,
-					ingredients TEXT,
-					steps TEXT,
+  # This is the main class to crawl recipes from a given url
+  #   1. Crawler will crawl url to find others recipes urls on the website
+  #   2. it will crawl urls founded to find other url again & again
+  #   3. it will scrape urls founded to get data
+  #
+  # @attr_reader url [String] first url parsed
+  # @attr_reader host [Symbol] of url's host
+  # @attr_reader scraped_urls [Array<String>] of url's host
+  # @attr_reader crawled_urls [Array<String>] of url's host
+  # @attr_reader to_crawl_urls [Array<String>] of url's host
+  # @attr_reader recipes [Array<RecipeScraper::Recipe>] recipes fetched
+  # @attr_reader db [SQLite3::Database] Sqlite database where recipe will be saved
+  class Crawler
+    # URL than crawler can parse
+    ALLOWED_URLS = {
+      cuisineaz: 'cuisineaz.com/recettes/',
+      marmiton: 'marmiton.org/recettes/',
+      g750: '750g.com/'
+    }.freeze
+    attr_reader :url, :host, :scraped_urls, :crawled_urls, :to_crawl_urls, :recipes
+    attr_accessor :interval_sleep_time
+    #
+    # Create a Crawler
+    # @param url [String] a url a recipe to scrawl other one
+    def initialize(url)
+      @url = url
+      if url_valid?
+        @recipes = []
+        @crawled_urls = []
+        @scraped_urls = []
+        @to_crawl_urls = []
+        @to_crawl_urls << url
+        @interval_sleep_time = 0
+        @db = SQLite3::Database.new 'results.sqlite3'
+        @db.execute "CREATE TABLE IF NOT EXISTS recipes(
+					Id INTEGER PRIMARY KEY,
+					title TEXT,
+					preptime INTEGER,
+					cooktime INTEGER,
+					ingredients TEXT,
+					steps TEXT,
 					image TEXT
 				)"
-			else
-				raise ArgumentError , 'This url cannot be used'
-			end
-		end
-		#
-		# Check if the url can be parsed and set the host
-		#
-		# @return [Boolean] true if url can be parsed
-		def url_valid?
-			ALLOWED_URLS.each do |host, url_allowed|
-				if url.include? url_allowed
-					@host = host
-					return true
-				end
-			end
-			return false
-		end
-		#
-		# Start the crawl
-		# @param limit [Integer] the maximum number of scraped recipes
-		# @param interval_sleep_time [Integer] waiting time between scraping
-		#
-		# @yield [RecipeSraper::Recipe] as recipe scraped
-		def crawl! limit=2, interval_sleep_time=0
-			recipes_returned = 0
-			if @host == :cuisineaz
-				while !@to_crawl_urls.empty? and limit > @recipes.count
-					# find all link on url given (and urls of theses)
-					get_links @to_crawl_urls[0]
-					# now scrape an url
-					recipe = scrape @to_crawl_urls[0]
-					yield recipe if recipe and block_given?
-					sleep interval_sleep_time
-				end
-			else
-				raise NotImplementedError
-			end
-		end
-		#
-		# Scrape given url
-		# param url [String] as url to scrape
-		#
-		# @return [RecipeSraper::Recipe] as recipe scraped
-		# @return [nil] if recipe connat be fetched
-		def scrape url
-			begin
-				recipe = RecipeSraper::Recipe.new url
-				@scraped_urls << url
-				@recipes << recipe
-				if save recipe
-					return recipe
-				else
-					raise SQLite3::Exception, 'cannot save recipe'
-				end
-			rescue OpenURI::HTTPError
-				return nil
-			end
-		end
-		#
-		# Get recipes links from the given url
-		# @param url [String] as url to scrape
-		#
-		# @return [void]
-		def get_links url
-			# catch 404 error from host
-			begin
-				doc = Nokogiri::HTML(open(url))
-				# find internal links on page
-				doc.css('#tagCloud  a').each do |link|
-					link = link.attr('href')
-					# If link correspond to a recipe we add it to recipe to scraw
-					if link.include?(ALLOWED_URLS[@host]) and !@crawled_urls.include?(url)
-						@to_crawl_urls << link
-					end
-				end
-				@to_crawl_urls.delete url
-				@crawled_urls << url
-				@to_crawl_urls.uniq!
-			rescue OpenURI::HTTPError
-				@to_crawl_urls.delete url
-				warn "#{url} cannot be reached"
-			end
-		end
-		#
-		# Save recipe
-		# @param recipe [RecipeSraper::Recipe] as recipe to save
-		#
-		# @return [Boolean] as true if success
-		def save recipe
-			begin
-				@db.execute "INSERT INTO recipes (title, preptime, cooktime, ingredients, steps, image)
+      else
+        raise ArgumentError, 'This url cannot be used'
+      end
+    end
+    #
+    # Check if the url can be parsed and set the host
+    #
+    # @return [Boolean] true if url can be parsed
+    def url_valid?
+      ALLOWED_URLS.each do |host, url_allowed|
+        if url.include? url_allowed
+          @host = host
+          return true
+        end
+      end
+      false
+    end
+    # Start the crawl
+    #
+    # @param limit [Integer] the maximum number of scraped recipes
+    # @param interval_sleep_time [Integer] waiting time between scraping
+    # @yield [RecipeScraper::Recipe] as recipe scraped
+    def crawl!(limit: 2, interval_sleep_time: 0)
+      recipes_returned = 0
+      if @host == :cuisineaz
+        while !@to_crawl_urls.empty? && (limit > @recipes.count)
+          # find all link on url given (and urls of theses)
+          url = @to_crawl_urls.first
+          next if url.nil?
+          get_links url
+          # now scrape an url
+          recipe = scrape url
+          yield recipe if recipe && block_given?
+          sleep interval_sleep_time
+        end
+      else
+        raise NotImplementedError
+      end
+    end
+    #
+    # Scrape given url
+    # param url [String] as url to scrape
+    #
+    # @return [RecipeScraper::Recipe] as recipe scraped
+    # @return [nil] if recipe connat be fetched
+    def scrape(url)
+      recipe = RecipeScraper::Recipe.new url
+      @scraped_urls << url
+      @recipes << recipe
+      if save recipe
+        return recipe
+      else
+        raise SQLite3::Exception, 'cannot save recipe'
+      end
+    rescue OpenURI::HTTPError
+      nil
+    end
+    #
+    # Get recipes links from the given url
+    # @param url [String] as url to scrape
+    #
+    # @return [void]
+    def get_links(url)
+      # catch 404 error from host
+      doc = Nokogiri::HTML(open(url))
+      # find internal links on page
+      doc.css('#tagCloud  a').each do |link|
+        link = link.attr('href')
+        # If link correspond to a recipe we add it to recipe to scraw
+        if link.include?(ALLOWED_URLS[@host]) && !@crawled_urls.include?(url)
+          @to_crawl_urls << link
+        end
+      end
+      @to_crawl_urls.delete url
+      @crawled_urls << url
+      @to_crawl_urls.uniq!
+    rescue OpenURI::HTTPError
+      @to_crawl_urls.delete url
+      warn "#{url} cannot be reached"
+    end
+    #
+    # Save recipe
+    # @param recipe [RecipeScraper::Recipe] as recipe to save
+    #
+    # @return [Boolean] as true if success
+    def save(recipe)
+      @db.execute "INSERT INTO recipes (title, preptime, cooktime, ingredients, steps, image)
 						VALUES (:title, :preptime, :cooktime, :ingredients, :steps, :image)",
-						title: recipe.title,
-						preptime: recipe.preptime,
-						ingredients: recipe.ingredients.join("\n"),
-						steps: recipe.steps.join("\n"),
-						image: recipe.image
-				return true
-			rescue SQLite3::Exception => e
-					puts "Exception occurred #{e}"
-					return false
-			end
-		end
-	end
-end
+                  title: recipe.title,
+                  preptime: recipe.preptime,
+                  ingredients: recipe.ingredients.join("\n"),
+                  steps: recipe.steps.join("\n"),
+                  image: recipe.image
+      true
+    rescue SQLite3::Exception => e
+      puts "Exception occurred #{e}"
+      false
+    end
+  end
+end

data/lib/recipe_crawler/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module RecipeCrawler
-  VERSION = "3.1.2"
+  VERSION = '4.0.0'.freeze
 end

data/recipe_crawler.gemspec CHANGED Viewed

@@ -1,29 +1,27 @@
-# coding: utf-8
-lib = File.expand_path('../lib', __FILE__)
+lib = File.expand_path('lib', __dir__)
 $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
 require 'recipe_crawler/version'
 Gem::Specification.new do |spec|
-  spec.name          = "recipe_crawler"
+  spec.name          = 'recipe_crawler'
   spec.version       = RecipeCrawler::VERSION
-  spec.authors       = ["madeindjs"]
-  spec.email         = ["madeindjs@gmail.com"]
-  spec.summary       = %q{Get all recipes from famous french cooking websites}
-  spec.description   = %q{This crawler will use my personnal scraper named 'RecipeScraper' to dowload recipes data from Marmiton, 750g or cuisineaz}
-  spec.homepage      = "https://github.com/madeindjs/recipe_crawler."
-  spec.license       = "MIT"
+  spec.authors       = ['Alexandre Rousseau']
+  spec.email         = ['contact@rousseau-alexandre.fr']
+  spec.summary       = 'Get all recipes from famous french cooking websites'
+  spec.description   = "This crawler will use my personnal scraper named 'RecipeScraper' to dowload recipes data from Marmiton, 750g or cuisineaz"
+  spec.homepage      = 'https://github.com/madeindjs/recipe_crawler'
+  spec.license       = 'MIT'
   spec.files         = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
   spec.executables   = ['recipe_crawler']
-  spec.require_paths = ["lib"]
-  spec.add_dependency "recipe_scraper", '>= 2.2.0'
+  spec.require_paths = ['lib']
+  spec.add_dependency 'recipe_scraper', '~> 2.0'
+  spec.add_dependency 'sqlite3', '~> 1.3'
-  spec.add_development_dependency "bundler", "~> 1.11"
-  spec.add_development_dependency "rake", "~> 10.0"
-  spec.add_development_dependency "rspec", "~> 3.0"
-  spec.add_development_dependency "yard"
+  spec.add_development_dependency 'bundler', '~> 1.17'
+  spec.add_development_dependency 'rake', '~> 10.0'
+  spec.add_development_dependency 'rspec', '~> 3.0'
+  spec.add_development_dependency 'yard'
 end

metadata CHANGED Viewed

@@ -1,43 +1,57 @@
 --- !ruby/object:Gem::Specification
 name: recipe_crawler
 version: !ruby/object:Gem::Version
-  version: 3.1.2
+  version: 4.0.0
 platform: ruby
 authors:
-- madeindjs
+- Alexandre Rousseau
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-12-05 00:00:00.000000000 Z
+date: 2018-12-08 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: recipe_scraper
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: 2.2.0
+        version: '2.0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+- !ruby/object:Gem::Dependency
+  name: sqlite3
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.3'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: 2.2.0
+        version: '1.3'
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.11'
+        version: '1.17'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.11'
+        version: '1.17'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
@@ -83,7 +97,7 @@ dependencies:
 description: This crawler will use my personnal scraper named 'RecipeScraper' to dowload
   recipes data from Marmiton, 750g or cuisineaz
 email:
-- madeindjs@gmail.com
+- contact@rousseau-alexandre.fr
 executables:
 - recipe_crawler
 extensions: []
@@ -104,7 +118,7 @@ files:
 - lib/recipe_crawler/crawler.rb
 - lib/recipe_crawler/version.rb
 - recipe_crawler.gemspec
-homepage: https://github.com/madeindjs/recipe_crawler.
+homepage: https://github.com/madeindjs/recipe_crawler
 licenses:
 - MIT
 metadata: {}
@@ -124,7 +138,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.5.1
+rubygems_version: 2.7.8
 signing_key:
 specification_version: 4
 summary: Get all recipes from famous french cooking websites