RubyGems - google_safe_browsing - Versions diffs - 0.1.0 - Mend

google_safe_browsing 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

data/MIT-LICENSE +20 -0
data/README.mkd +101 -0
data/Rakefile +26 -0
data/lib/generators/install_generator.rb +25 -0
data/lib/generators/templates/create_google_safe_browsing_tables.rb +37 -0
data/lib/google_safe_browsing.rb +58 -0
data/lib/google_safe_browsing/add_shavar.rb +5 -0
data/lib/google_safe_browsing/api_v2.rb +60 -0
data/lib/google_safe_browsing/binary_helper.rb +40 -0
data/lib/google_safe_browsing/canonicalize.rb +181 -0
data/lib/google_safe_browsing/chunk_helper.rb +77 -0
data/lib/google_safe_browsing/effective_tld_names.dat.txt +5197 -0
data/lib/google_safe_browsing/full_hash.rb +5 -0
data/lib/google_safe_browsing/google_safe_browsing_railtie.rb +19 -0
data/lib/google_safe_browsing/hash_helper.rb +29 -0
data/lib/google_safe_browsing/http_helper.rb +42 -0
data/lib/google_safe_browsing/rescheduler.rb +14 -0
data/lib/google_safe_browsing/response_helper.rb +181 -0
data/lib/google_safe_browsing/sub_shavar.rb +6 -0
data/lib/google_safe_browsing/top_level_domain.rb +50 -0
data/lib/google_safe_browsing/version.rb +3 -0
data/lib/tasks/google_safe_browsing_tasks.rake +11 -0
metadata +173 -0

data/MIT-LICENSE ADDED Viewed

@@ -0,0 +1,20 @@
+Copyright 2012 YOURNAME
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.mkd ADDED Viewed

@@ -0,0 +1,101 @@
+# Google Safe Browsing Rails 3 Plugin
+This gem allows easy Google Safe Browsing integration
+with Rails 3 apps.
+It includes:
+* a migration generator for database schema
+* method to update your lists
+* method to lookup a url
+* rake tasks to update hash database
+* Autonomous updating via Resque and Resque Scheduler
+----------------------
+##Installation
+Install the gem
+    gem install google_safe_browsing
+Or add it to your Gemfile
+    #Gemfile
+    ...
+    gem 'google_safe_browsing'
+Then, generate the migration and run it
+    $ rails generate google_safe_browsing:install
+        create db/migrate/20120227143535_create_google_safe_browsing_tables.rb
+    $ rake db:migrate
+Add your Google Safe Browsing API key to congif/application.rb
+You can get a key from the [Google Safe Browsing website](http://code.google.com/apis/safebrowsing/key_signup.html)
+    #config/application.rb
+    ...
+    config.google_safe_browsing.api_key = 'MySuperAwesomeKey5124'
+## Rake Tasks
+You can run an update manually
+    $ rake google_safe_browsing:update
+Or, if you have [Resque](https://github.com/defunkt/resque) and
+[Resque Scheduler](https://github.com/bvandenbos/resque-scheduler) set up, you can
+run an update and automatically schedule another update based on the 'next polling
+interval' parameter from the API
+    $ rake google_safe_browsing:update_and_reschedule
+## Usage
+To programatically run an update in your app
+    GoogleSafeBrowsing::APIv2.update
+Note: This can take a while, especially when first seeding your database. I wouldn't recommend
+calling this in a controller for a normal page request.
+To check a url for badness
+    GoogleSafeBrowsing::APIv2.lookup('http://bad.url.address.here.com.edu/forProfit')
+The url string parameter does not have to be any specific format or Canonicalization the Google
+Safe Browsing gem will handle all of that for you. Please report any errors from a weirdly formatted
+url though. I most likely have missed some cases.
+The `lookup` method returns a string ( either 'malware' or 'phishing' ) for the name of the black list
+which the url appears on, or `nil` if the url is not on Google's list.
+----------------
+### More information
+[Google Safe Browsing API Reference](http://code.google.com/apis/safebrowsing/)
+----------------
+### Inspiration
+The interface of this gem is based upon these two gems, which are
+based on Safe Browsing v1 API:
+https://github.com/koke/malware_api
+and
+https://github.com/codelux/malware_api
+------------------
+Thank you for using my gem! Please report any bugs or issues. Contributions are also always welcome!
+-- Chris Marshall

data/Rakefile ADDED Viewed

@@ -0,0 +1,26 @@
+#!/usr/bin/env rake
+begin
+  require 'bundler/setup'
+rescue LoadError
+  puts 'You must `gem install bundler` and `bundle install` to run rake tasks'
+end
+begin
+  require 'rdoc/task'
+rescue LoadError
+  require 'rdoc/rdoc'
+  require 'rake/rdoctask'
+  RDoc::Task = Rake::RDocTask
+end
+RDoc::Task.new(:rdoc) do |rdoc|
+  rdoc.rdoc_dir = 'rdoc'
+  rdoc.title    = 'GoogleSafeBrowsing'
+  rdoc.options << '--line-numbers'
+  rdoc.rdoc_files.include('README.rdoc')
+  rdoc.rdoc_files.include('lib/**/*.rb')
+end
+Bundler::GemHelper.install_tasks

data/lib/generators/install_generator.rb ADDED Viewed

@@ -0,0 +1,25 @@
+require 'rails/generators'
+require 'rails/generators/migration'
+module GoogleSafeBrowsing
+  class InstallGenerator < Rails::Generators::Base
+    include Rails::Generators::Migration
+    desc "Creates Migrations for Shavar Hashes and Full Hashes. Creates initializer file for API Key."
+    def self.source_root
+      @source_root ||= File.join(File.dirname(__FILE__), 'templates')
+    end
+    def self.next_migration_number(path)
+      if ActiveRecord::Base.timestamped_migrations
+        Time.now.utc.strftime("%Y%m%d%H%M%S")
+      else
+        "%.3d" % (current_migration_number(dirname) + 1)
+      end
+    end
+    def create_migration_files
+      migration_template 'create_google_safe_browsing_tables.rb', "db/migrate/create_google_safe_browsing_tables"
+    end
+  end
+end

data/lib/generators/templates/create_google_safe_browsing_tables.rb ADDED Viewed

@@ -0,0 +1,37 @@
+class CreateGoogleSafeBrowsingTables < ActiveRecord::Migration
+  def self.up
+    create_table :gsb_full_hashes do |t|
+      t.string  :full_hash
+      t.integer :add_chunk_number
+      t.string  :list
+    end
+    add_index :gsb_full_hashes, :full_hash
+    create_table :gsb_add_shavars do |t|
+      t.string :prefix
+      t.string :host_key
+      t.integer :chunk_number, :null => false
+      t.string :list, :null => false
+    end
+    add_index :gsb_add_shavars, :host_key
+    add_index :gsb_add_shavars, [:host_key, :prefix ]
+    create_table :gsb_sub_shavars do |t|
+      t.string :prefix
+      t.string :host_key
+      t.integer :add_chunk_number
+      t.integer :chunk_number, :null => false
+      t.string :list, :null => false
+    end
+    add_index :gsb_sub_shavars, :host_key
+    add_index :gsb_sub_shavars, [:host_key, :prefix ]
+  end
+  def self.down
+    drop_table :gsb_add_shavars
+    drop_table :gsb_sub_shavars
+    drop_table :gsb_full_hashes
+  end
+end

data/lib/google_safe_browsing.rb ADDED Viewed

@@ -0,0 +1,58 @@
+require 'net/http'
+require 'open-uri'
+require 'active_record'
+require 'google_safe_browsing/google_safe_browsing_railtie' if defined?(Rails)
+require File.dirname(__FILE__) + '/google_safe_browsing/api_v2'
+require File.dirname(__FILE__) + '/google_safe_browsing/binary_helper'
+require File.dirname(__FILE__) + '/google_safe_browsing/canonicalize'
+require File.dirname(__FILE__) + '/google_safe_browsing/chunk_helper'
+require File.dirname(__FILE__) + '/google_safe_browsing/hash_helper'
+require File.dirname(__FILE__) + '/google_safe_browsing/http_helper'
+require File.dirname(__FILE__) + '/google_safe_browsing/response_helper'
+require File.dirname(__FILE__) + '/google_safe_browsing/top_level_domain'
+require File.dirname(__FILE__) + '/google_safe_browsing/add_shavar'
+require File.dirname(__FILE__) + '/google_safe_browsing/sub_shavar'
+require File.dirname(__FILE__) + '/google_safe_browsing/full_hash'
+require File.dirname(__FILE__) + '/google_safe_browsing/rescheduler'
+module GoogleSafeBrowsing
+  class Config
+    attr_accessor :client, :app_ver, :p_ver, :host, :current_lists, :api_key
+    def initialize
+      @client         = 'api'
+      @app_ver        = VERSION
+      @p_ver          = '2.2'
+      @host           = 'http://safebrowsing.clients.google.com/safebrowsing'
+      @current_lists  = [ 'googpub-phish-shavar', 'goog-malware-shavar' ]
+    end
+  end
+  def self.config
+    @@config ||= Config.new
+  end
+  def self.configure
+    yield self.config
+  end
+  def self.kick_off
+    Resque.enqueue(Rescheduler)
+  end
+  def self.friendly_list_name(list)
+    case list
+    when 'goog-malware-shavar'
+      'malware'
+    when 'googpub-phish-shavar'
+      'phishing'
+    else
+      nil
+    end
+  end
+end

data/lib/google_safe_browsing/add_shavar.rb ADDED Viewed

@@ -0,0 +1,5 @@
+module GoogleSafeBrowsing
+  class AddShavar < ActiveRecord::Base
+    set_table_name 'gsb_add_shavars'
+  end
+end

data/lib/google_safe_browsing/api_v2.rb ADDED Viewed

@@ -0,0 +1,60 @@
+module GoogleSafeBrowsing
+  class APIv2
+    def self.update
+      data_response = HttpHelper.get_data
+      to_do_array = ResponseHelper.parse_data_response(data_response.body)
+      to_do_array[:lists].each do |list|
+        to_do_array[:data_urls][list].each do |url|
+          puts "#{list} - #{url}\n"
+          ResponseHelper.receive_data('http://' + url, list)
+        end
+      end
+      to_do_array[:delay_seconds]
+    end
+    def self.lookup(url)
+      urls = Canonicalize.urls_for_lookup(url)
+      hashes = HashHelper.urls_to_hashes(urls)
+      raw_hash_array = hashes.collect{ |h| h.to_s }
+      if full = FullHash.where(:full_hash => raw_hash_array).first
+        return GoogleSafeBrowsing.friendly_list_name(full.list)
+      end
+      hits =  AddShavar.where(:prefix => hashes.map{|h| h.prefix}).collect{ |s| [ s.list, s.prefix ] }
+      safes = SubShavar.where(:prefix => hashes.map{|h| h.prefix}).collect{ |s| [ s.list, s.prefix ] }
+      reals = hits - safes
+      if reals.any?
+        full_hashes = HttpHelper.request_full_hashes(reals.collect{|r| r[1] })
+        # save hashes first
+        # cannot return early because all FullHashes need to be saved
+        hit_list = nil
+        full_hashes.each do |hash|
+          FullHash.create!(:list => hash[:list], :add_chunk_number => hash[:add_chunk_num],
+                                       :full_hash => hash[:full_hash])
+          hit_list = hash[:list] if raw_hash_array.include?(hash[:full_hash])
+        end
+        return GoogleSafeBrowsing.friendly_list_name(hit_list)
+      end
+      nil
+    end
+    def self.delay(delay_seconds)
+      puts "Google told us to wait for #{delay_seconds} seconds"
+      puts "We will wait...."
+      start_time = Time.now
+      while(start_time + delay_seconds > Time.now)
+          puts "#{(delay_seconds - (Time.now - start_time)).to_i}..."
+          sleep(10)
+      end
+      puts "Thank you for being patient"
+    end
+  end
+end

data/lib/google_safe_browsing/binary_helper.rb ADDED Viewed

@@ -0,0 +1,40 @@
+module GoogleSafeBrowsing
+  class BinaryHelper
+    def self.read_bytes_as_hex(iter, count)
+      read_bytes_from(iter, count).unpack("H#{count * 2}")[0]
+    end
+    def self.four_as_hex(string)
+      string.unpack('H8')[0]
+    end
+    def self.read_bytes_from(iter, count)
+      ret = ''
+      count.to_i.times { ret << iter.next }
+      ret
+   #rescue
+   #  puts "Tried to read past chunk iterator++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
+   #  return nil
+    end
+    def self.unpack_host_key(bin)
+      bin.unpack('H8')[0]
+    end
+    def self.unpack_count(bin)
+      # this may not be correct
+      bin.unpack('U')[0]
+    end
+    def self.unpack_add_chunk_num(bin)
+      bin.unpack('N')[0]
+    end
+    def self.hex_to_bin(hex)
+      hex.to_a.pack('H*')
+    end
+  end
+end

data/lib/google_safe_browsing/canonicalize.rb ADDED Viewed

@@ -0,0 +1,181 @@
+require 'uri'
+require 'ip'
+require File.dirname(__FILE__) + '/top_level_domain.rb'
+module GoogleSafeBrowsing
+  class Canonicalize
+    PROTOCOL_DELIMITER = '://'
+    DEFAULT_PROTOCOL = 'http'
+    def self.url(raw_url)
+      #puts raw_url
+      #remove tabs, carriage returns and line feeds
+      raw_url.gsub!("\t",'')
+      raw_url.gsub!("\r",'')
+      raw_url.gsub!("\n",'')
+      cann = raw_url.clone
+      cann.gsub!(/\A\s+|\s+\Z/, '')
+      cann = remove_fragment(cann)
+      # repeatedly unescape until no more escaping
+      cann = recursively_unescape(cann)
+      # remove leading PROTOCOL
+      cann = remove_protocol(cann)
+      #split into host and path components
+      splits = split_host_path(cann)
+      cann = fix_host( splits[:host] ) + '/' + fix_path( splits[:path] )
+      # add leading protocol
+      @protocol ||= DEFAULT_PROTOCOL
+      cann = @protocol + PROTOCOL_DELIMITER + cann
+      strict_escape(cann)
+    end
+    def self.urls_for_lookup(lookup_url)
+      lookup_url = url(lookup_url)
+      lookup_url = remove_protocol(lookup_url)
+      splits = split_host_path(lookup_url)
+      host_strings = [splits[:host]]
+      host = TopLevelDomain.split_from_host(splits[:host]).last(5)
+      ( host.length - 1 ).times do
+        host_strings << host.join('.')
+        host.shift
+      end
+      host_strings.uniq!
+      path_split = splits[:path].split('?')
+      path = path_split[0]
+      params = path_split[1]
+      path_strings = [ splits[:path], '/' ]
+      if path
+        path_strings << path
+        paths_to_append = path.split('/').first(3)
+        paths_to_append.length.times do
+          path_strings << paths_to_append.join('/')
+          paths_to_append.pop
+        end
+      end
+      path_strings.map!{ |p| '/' + p + '/' }
+      path_strings.map!{ |p| p.gsub!(/\/+/, '/') }
+      path_strings.compact!
+      path_strings.uniq!
+      #puts host_strings.length
+      #puts path_strings.length
+      ( cart_prod(host_strings, path_strings) + host_strings ).uniq
+    end
+    private
+      def self.cart_prod(a_one, a_two)
+        result = []
+        a_one.each do |i|
+          a_two.each do |j|
+            result << "#{i}#{j}"
+          end
+        end
+        result
+      end
+      def self.split_host_path(cann)
+        ret= { :host => cann, :path => '' }
+        split_point = cann.index('/')
+        if split_point
+          ret[:host] = cann[0..split_point-1]
+          ret[:path] = cann[split_point+1..-1]
+        end
+        ret
+      end
+      def self.remove_fragment(string)
+        string = string[0..string.index('#')-1] if string.index('#')
+        string
+      end
+      def self.recursively_unescape(url)
+        compare_url = url.clone
+        url = URI.unescape(url)
+        while(compare_url != url)
+          compare_url = url.clone
+          url = URI.unescape(url)
+        end
+        url
+      end
+      def self.fix_host(host)
+        #puts "In Host: #{host}"
+        # remove leading and trailing dots, multiple dots to one
+        host.gsub!(/\A\.+|\.+\Z/, '')
+        host.gsub!(/\.+/, '.')
+        host.downcase!
+        host = IP::V4.new(host.to_i).to_s if host.to_i > 256
+        host
+      end
+      def self.fix_path(path)
+        #puts "In Path: #{path}"
+        #remove leading slash
+        path = path[1..-1] if path[0..0] == '/'
+        preserve_trailing_slash = ( path[-1..-1] == '/' )
+        if path.index('?')
+          first_ques = path.index('?')
+          params = path[first_ques..-1]
+          path = path[0..first_ques-1]
+        end
+        # remove multiple '/'
+        path.gsub!(/\/+/, '/')
+        new_path_array = []
+        path.split('/').each do |p|
+          new_path_array << p unless p == '.' || p == '..'
+          new_path_array.pop if p == '..'
+        end
+        path = new_path_array.join('/')
+        path += '/' if preserve_trailing_slash
+        path += params if params
+        path
+      end
+      def self.strict_escape(url)
+        url = URI.escape url
+        # unescape carat, may need other optionally escapeable chars
+        url.gsub!('%5E','^')
+        url
+      end
+      def self.remove_protocol(cann)
+        if cann.index(PROTOCOL_DELIMITER)
+          delimiting_index = cann.index(PROTOCOL_DELIMITER)
+          @protocol = cann[0..delimiting_index-1]
+          protocol_end_index = delimiting_index + PROTOCOL_DELIMITER.length
+          cann = cann[protocol_end_index..-1]
+        end
+        cann
+      end
+  end
+end