rangefinder 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,15 @@
1
+ ---
2
+ !binary "U0hBMQ==":
3
+ metadata.gz: !binary |-
4
+ ODRiODljY2Q5MmFhZjBjYzg1ZjI5Y2YxMDgyZmY1YWZiOTk0MTA4YQ==
5
+ data.tar.gz: !binary |-
6
+ YTgwODAzMjA3ODNjNTZmYjZkYWNhMGQ3YjM4MTdjYmMzNzFmN2RmMg==
7
+ SHA512:
8
+ metadata.gz: !binary |-
9
+ NDk5NWUwNWI0NGMyMzRlMTcwYWI5OGQ0N2M5NGNiMzU2MTc5ZDM1NmFkMTU2
10
+ NWRmZTRiYWMwOTVlZmFjZTc2OGQyZDY5ODdjMzk2ODk0Yjg5MjAyYjQ4YWQ2
11
+ MzJiY2ZhOGY4NDkxNGI2NjU1NmIzZTg0YWVkZmJiMWUxYzU5NWQ=
12
+ data.tar.gz: !binary |-
13
+ ZDFmMmE3NWM2YTMzNDNhNjU0ZTdjYWU2YTRmNjZjYmZmMGQ5ZTllYzY2YTdl
14
+ ZWJiNGY3ZjZlOTM1YmU1ZmYzYzUxZTVjYzE2MGE2OWY0OTA0MGM4YjljZTFi
15
+ YWNlZWM2NjA0MzA1MjBmYjlhNTMwNDE1OWMzYjhlYzcyMGJhMTI=
@@ -0,0 +1,17 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -0,0 +1,3 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.3
@@ -0,0 +1,3 @@
1
+ 0.0.1 / 2014-01-10
2
+
3
+ Initial release!
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in rangefinder.gemspec
4
+ gemspec
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2014 Seamus Abshere
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,71 @@
1
+ # Rangefinder
2
+
3
+ Helps you find ranges of IDs, like when you're scraping a website and you need to guess IDs.
4
+
5
+ You tell it what a valid ID is and it looks for ranges of consecutive valid IDs. It assumes that each probe is expensive.
6
+
7
+ ## Installation
8
+
9
+ Add this line to your application's Gemfile:
10
+
11
+ gem 'rangefinder'
12
+
13
+ And then execute:
14
+
15
+ $ bundle
16
+
17
+ Or install it yourself as:
18
+
19
+ $ gem install rangefinder
20
+
21
+ ## Usage
22
+
23
+ Let's say you're rainbow tabling a website but you have to guess the IDs. What you **don't** know is that all valid ids are in the ranges `100..11_000` and `100_000..110_000`. You pass a "probe" block that returns true if an ID is valid:
24
+
25
+ ranges = Rangefinder.new.probe do |possible_id|
26
+ # your probe code here. for example:
27
+ response = http.get "http://example.com/items", id: possible_id
28
+ response.status == 200
29
+ end
30
+
31
+ You get back ranges where we think there are valid IDs. In this case, pretty good! (See Goals above)
32
+
33
+ >> ranges
34
+ => [ 0..12_200, 99_455..111_600 ]
35
+
36
+ Now you can scrape them one by one:
37
+
38
+ ranges.each do |range|
39
+ range.each do |id|
40
+ # scrape this ID
41
+ end
42
+ end
43
+
44
+ ### Please do cache
45
+
46
+ It's nice when your probe block makes a call that is cached somehow. That way when you go back and use the ranges, you're not hitting all those URLs over again.
47
+
48
+ ##$ Goals
49
+
50
+ By default
51
+
52
+ 1. Detect at least 90% of valid IDs in 1000-long ranges with up to 90% intra-range sparsity
53
+ 1. Tolerate gaps of 100,000
54
+ 1. Probe no more than 5% of the range
55
+
56
+ Maybe
57
+
58
+ 1. Don't overestimate valid ranges more than X
59
+
60
+ ### Wishlist
61
+
62
+ 1. Accept a known ID as the basis for smarter probing
63
+ 1. Internally, calculate density and use that to choose `min_range` and `samp`
64
+
65
+ ## Contributing
66
+
67
+ 1. Fork it ( http://github.com/<my-github-username>/rangefinder/fork )
68
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
69
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
70
+ 4. Push to the branch (`git push origin my-new-feature`)
71
+ 5. Create new Pull Request
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,58 @@
1
+ require "rangefinder/version"
2
+ require 'rangefinder/memo'
3
+
4
+ require 'ranges_merger'
5
+
6
+ class Rangefinder
7
+ MAX = 2**32 - 1
8
+ MAX_GAP = 1e5
9
+ INIT_SAMP = 0.01
10
+ MAX_SAMP = 0.1
11
+
12
+ def probe(options = {}, &blk)
13
+ ranges, _, _ = probe_with_hits_and_misses(options, &blk)
14
+ end
15
+
16
+ def probe_with_hits_and_misses(options = {}, &blk)
17
+ memo = Memo.new
18
+ _probe(memo, options, &blk)
19
+ [ ::RangesMerger.merge(memo.ranges), memo.hits, memo.misses ]
20
+ end
21
+
22
+ private
23
+
24
+ def _probe(memo, options = {}, &blk)
25
+ first = [options.fetch(:first, 0), 0].max.round
26
+ last = [options.fetch(:last, MAX), MAX].min.round
27
+ max_gap = options.fetch(:max_gap, MAX_GAP)
28
+ samp = options.fetch(:samp, INIT_SAMP)
29
+ if samp >= MAX_SAMP
30
+ memo.ranges << (first..last)
31
+ else
32
+ min_range = (10 ** (2 - Math.log(samp, 10))).round
33
+ anything = false
34
+ first_good = nil
35
+ i = first
36
+ last_good = first
37
+ begin
38
+ if blk.call(i)
39
+ memo.hit!
40
+ anything = true
41
+ first_good ||= i
42
+ last_good = i
43
+ else
44
+ memo.miss!
45
+ end
46
+ gap = i - last_good
47
+ if first_good and gap > min_range
48
+ _probe memo, {first: first_good-min_range, last: last_good+min_range, samp: samp*3}, &blk
49
+ first_good = nil
50
+ last_good = i
51
+ gap = 0
52
+ end
53
+ samp1 = gap > Math::E ? samp * Math.log(gap) : samp
54
+ i += (rand(100) * (1 - samp1)).round
55
+ end until i >= last or (gap > max_gap and anything) # sorry for mixed metaphor
56
+ end
57
+ end
58
+ end
@@ -0,0 +1,19 @@
1
+ class Rangefinder
2
+ class Memo
3
+ attr_reader :ranges
4
+ attr_reader :hits
5
+ attr_reader :misses
6
+ def initialize
7
+ @ranges = []
8
+ @hits = 0
9
+ @misses = 0
10
+ @mutex = Mutex.new
11
+ end
12
+ def hit!
13
+ @mutex.synchronize { @hits += 1 }
14
+ end
15
+ def miss!
16
+ @mutex.synchronize { @misses += 1 }
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,3 @@
1
+ class Rangefinder
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,27 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'rangefinder/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "rangefinder"
8
+ spec.version = Rangefinder::VERSION
9
+ spec.authors = ["Seamus Abshere"]
10
+ spec.email = ["seamus@abshere.net"]
11
+ spec.summary = %q{Helps you find ranges of IDs, like when you're scraping a website and you need to guess IDs.}
12
+ spec.description = %q{Helps you find ranges of IDs, like when you're scraping a website and you need to guess IDs. You tell it what a valid ID is and it looks for ranges of consecutive valid IDs. It assumes that each probe is expensive.}
13
+ spec.homepage = "https://github.com/seamusabshere/rangefinder"
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files`.split($/)
17
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_runtime_dependency 'ranges_merger'
22
+
23
+ spec.add_development_dependency "bundler", "~> 1.5"
24
+ spec.add_development_dependency "rake"
25
+ spec.add_development_dependency "rspec"
26
+ spec.add_development_dependency "pry"
27
+ end
@@ -0,0 +1,70 @@
1
+ require 'spec_helper'
2
+
3
+ # https://github.com/rails/rails/blob/444ce93397dba3505ecef4973edba40de4fc08c6/activesupport/lib/active_support/core_ext/range/include_range.rb#L12
4
+ # (1..5).include?(1..5) # => true
5
+ # (1..5).include?(2..3) # => true
6
+ # (1..5).include?(2..6) # => false
7
+ def range_include?(zelf, other)
8
+ # 1...10 includes 1..9 but it does not include 1..10.
9
+ operator = zelf.exclude_end? && !other.exclude_end? ? :< : :<=
10
+ zelf.include?(other.first) && other.last.send(operator, zelf.last)
11
+ end
12
+
13
+ describe Rangefinder do
14
+ expected_ranges = []
15
+ pos = 0
16
+ 100.times do
17
+ len = 1000
18
+ pos += rand(100_000).to_i
19
+ expected_ranges << ((pos)..(len+pos))
20
+ end
21
+ expected_id_count = expected_ranges.map(&:count).inject(:+)
22
+
23
+ cache = {}
24
+
25
+ (0..0.9).step(0.1).each do |sparsity|
26
+ describe "sparsity=#{'%g' % sparsity}" do
27
+ found_ranges, hits, misses = Rangefinder.new.probe_with_hits_and_misses do |i|
28
+ r = (cache[i] ||= rand)
29
+ (r > sparsity) && expected_ranges.any? { |r| r.include?(i) }
30
+ end
31
+
32
+ # $stderr.puts
33
+ # $stderr.puts
34
+ # $stderr.puts "found_ranges=#{found_ranges}"
35
+ # $stderr.puts
36
+ # $stderr.puts "expected_ranges=#{expected_ranges}"
37
+
38
+ # it "finds #{expected_ranges.length} ranges" do
39
+ # expected_ranges.each do |expected|
40
+ # expect(found_ranges.any? { |found| range_include?(found, expected) }).to be_true, "#{expected} not in #{found_ranges} found"
41
+ # end
42
+ # end
43
+
44
+ it "finds 95% of ids" do
45
+ real_found_ids = []
46
+ expected_ranges.each do |expected|
47
+ found_ranges.each do |found|
48
+ # if found.include?(expected)
49
+ if range_include?(found, expected)
50
+ real_found_ids << expected.to_a
51
+ end
52
+ end
53
+ end
54
+ real_found_ids = real_found_ids.flatten.uniq
55
+ expect((real_found_ids.count.to_f / expected_id_count).round(2)).to be >= 0.95
56
+ end
57
+
58
+ it "probes only 5% of the space" do
59
+ highest_id = expected_ranges.map(&:last).max
60
+ expect(((hits+misses).to_f / highest_id).round(2)).to be <= 0.05
61
+ end
62
+
63
+ it "exaggerates no more than 5%" do
64
+ found_ids = found_ranges.map(&:to_a).flatten.uniq
65
+ expect((found_ids.count.to_f / expected_id_count).round(2)).to be <= 1.05
66
+ end
67
+ end
68
+ end
69
+
70
+ end
@@ -0,0 +1,4 @@
1
+ require 'pry'
2
+
3
+ $LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
4
+ require 'rangefinder'
metadata ADDED
@@ -0,0 +1,134 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rangefinder
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Seamus Abshere
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2014-01-11 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: ranges_merger
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ! '>='
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ! '>='
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ~>
32
+ - !ruby/object:Gem::Version
33
+ version: '1.5'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ~>
39
+ - !ruby/object:Gem::Version
40
+ version: '1.5'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ! '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ! '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ! '>='
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: pry
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ! '>='
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ! '>='
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ description: Helps you find ranges of IDs, like when you're scraping a website and
84
+ you need to guess IDs. You tell it what a valid ID is and it looks for ranges of
85
+ consecutive valid IDs. It assumes that each probe is expensive.
86
+ email:
87
+ - seamus@abshere.net
88
+ executables: []
89
+ extensions: []
90
+ extra_rdoc_files: []
91
+ files:
92
+ - .gitignore
93
+ - .rspec
94
+ - .travis.yml
95
+ - CHANGELOG
96
+ - Gemfile
97
+ - LICENSE.txt
98
+ - README.md
99
+ - Rakefile
100
+ - lib/rangefinder.rb
101
+ - lib/rangefinder/memo.rb
102
+ - lib/rangefinder/version.rb
103
+ - rangefinder.gemspec
104
+ - spec/rangefinder_spec.rb
105
+ - spec/spec_helper.rb
106
+ homepage: https://github.com/seamusabshere/rangefinder
107
+ licenses:
108
+ - MIT
109
+ metadata: {}
110
+ post_install_message:
111
+ rdoc_options: []
112
+ require_paths:
113
+ - lib
114
+ required_ruby_version: !ruby/object:Gem::Requirement
115
+ requirements:
116
+ - - ! '>='
117
+ - !ruby/object:Gem::Version
118
+ version: '0'
119
+ required_rubygems_version: !ruby/object:Gem::Requirement
120
+ requirements:
121
+ - - ! '>='
122
+ - !ruby/object:Gem::Version
123
+ version: '0'
124
+ requirements: []
125
+ rubyforge_project:
126
+ rubygems_version: 2.1.11
127
+ signing_key:
128
+ specification_version: 4
129
+ summary: Helps you find ranges of IDs, like when you're scraping a website and you
130
+ need to guess IDs.
131
+ test_files:
132
+ - spec/rangefinder_spec.rb
133
+ - spec/spec_helper.rb
134
+ has_rdoc: