FreedomCoder-esearchy 0.0.19 → 0.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc CHANGED
@@ -1,20 +1,52 @@
1
1
  = Esearch
2
2
 
3
3
  == DESCRIPTION
4
- Esearchy is a small library capable of searching the internet for email addresses. Currently, the supported search methods are engines such as Google, Bing, Yahoo, PGP servers, GoogleGroups, LinkedIn, etc , but I intend to add many more.
4
+ Esearchy is a small library capable of searching the internet for email addresses. Currently, the supported search methods are, but not limited to:
5
+
6
+ * Search engines:
7
+ * Google
8
+ * Bing
9
+ * Yahoo,
10
+ * AltaVista
11
+ * Social Networks:
12
+ * LinkedIn
13
+ * Google Profiles ( Based on DigiNinja's idea http://www.digininja.org/files/gpscan.rb )
14
+ * Naymz
15
+ * PGP servers
16
+ * Usenets
17
+ * GoogleGroups Search
18
+
19
+
20
+ But Searches do not stop there, ESearchy it also looks for emails inside:
21
+
22
+ * PDF
23
+ * DOC
24
+ * DOCX
25
+ * XLSX
26
+ * PPTX
27
+ * ODT
28
+ * ODP
29
+ * ODS
30
+ * ODB
31
+ * ASN
32
+ * TXT
33
+
34
+ Once all the text is parsed within the file the emails are added to the list of found accounts.
5
35
 
6
- Also, the library searches inside .pdf, .docx, .xlsx, .pptx, asn and .txt files for emails addresses and adds them to the list of found accounts. Finally, we have support for .docs files but for now only in Windows Platforms.
7
36
  In order to parse Microsoft Word (.doc):
8
37
  * You Either need a windows Platform with Word installed.
9
38
  * Install AntiWord. ( http://www.winfield.demon.nl/ )
39
+ * Or if non of the above is on the OS, we perform a raw search inside the file.
10
40
 
11
41
  NOTE: THIS IS STILL BEING DEVELOPED CODE IS SUBMITTED DAILY SO BE AWARE THAT CODE MIGHT NOT WORK PROPERLY AL THE TIME. IF SOMETHING GOES WRONG PLEASE RAISE AN ISSUE.
12
42
 
13
- In order to work Bing and Yahoo need an appid, for which you will have to create one for each and place them in fles so the library will be able to work properly.
43
+ In order to work Bing and Yahoo need an appid. This API Keys could be added once in below mentioned files so the library will be able to work properly, and or passed as parameter each time we execute a search.
14
44
  * data/yahoo.key
15
45
  * data/bing.key
16
46
 
17
- Soon, users should be able to also pass them as parameters.
47
+ Furthermore, in order to use the LinkedIn People Search we need to pass two parameters.
48
+ * company name ( with Object.company_name )
49
+ * User/password for LinkedIn. (with Object.linkedin_credentials) For this we can use ESearchy.BUGMENOT which will search a linkedin user in bugmenot.com
18
50
 
19
51
  == SUPPORT:
20
52
 
@@ -80,7 +112,7 @@ Not short of that now, we also have the possibility of choosing between a Librar
80
112
  * cgi
81
113
  * pdf/reader
82
114
  * json
83
- * rubyzip
115
+ * rubyzip ( Migrating to FreedomCoder-rubyzip 0.9.2 so it's 1.9 compatible)
84
116
 
85
117
  == INSTALL:
86
118
 
data/lib/esearchy.rb CHANGED
@@ -1,16 +1,25 @@
1
+ require 'resolv'
1
2
  local_path = "#{File.dirname(__FILE__) + '/esearchy/'}"
2
- %w{google googlegroups bing yahoo pgp keys
3
- linkedin logger bugmenot}.each { |lib| require local_path + lib }
3
+ %w{keys logger bugmenot}.each { |lib| require local_path + lib }
4
+ %w{google bing yahoo altavista }.each { |lib| require local_path + "SearchEngines/"+lib }
5
+ %w{linkedin googleprofiles naymz}.each { |lib| require local_path + "SocialNetworks/"+lib }
6
+ %w{googlegroups pgp usenet}.each { |lib| require local_path + "OtherEngines/"+lib }
4
7
 
5
8
  class ESearchy
6
- #Constants
7
9
 
10
+ #Constants
8
11
  LIBRARY = 1
9
12
  APP = 2
10
-
11
13
  LOG = Logger.new(1, $stdout)
12
14
  BUGMENOT = BMN::fetch_user("linkedin.com")
13
-
15
+ DEFAULT_ENGINES = [:Google, :Bing, :Yahoo, :PGP, :LinkedIn,
16
+ :GoogleGroups, :Altavista, :Usenet, :GoogleProfiles, :Naymz]
17
+ case RUBY_PLATFORM
18
+ when /mingw|mswin/
19
+ TEMP = "C:\\WINDOWS\\Temp\\"
20
+ else
21
+ TEMP = "/tmp/"
22
+ end
14
23
  #End Constants
15
24
 
16
25
  def log_type=(value)
@@ -21,12 +30,6 @@ class ESearchy
21
30
  ESearchy::LOG.file = value
22
31
  end
23
32
 
24
- def eng(arr)
25
- hsh = {}; arr.each {|e| hsh[e] = instance_eval "#{e}"}; hsh
26
- end
27
-
28
- DEFAULT_ENGINES = [:Google, :Bing, :Yahoo, :PGP, :LinkedIn]
29
-
30
33
  def initialize(options={}, &block)
31
34
  @query = options[:query]
32
35
  @depth_search = options[:depth] || true
@@ -36,14 +39,26 @@ class ESearchy
36
39
  :Bing => Bing,
37
40
  :Yahoo => Yahoo,
38
41
  :PGP => PGP,
39
- :LinkedIn => LinkedIn }
40
-
42
+ :LinkedIn => LinkedIn,
43
+ :GoogleGroups => GoogleGroups,
44
+ :Altavista => Altavista,
45
+ :Usenet => Usenet,
46
+ :GoogleProfiles => GoogleProfiles,
47
+ :Naymz => Naymz }
41
48
  @engines.each {|n,e| @engines[n] = e.new(@maxhits)}
42
49
  @threads = Array.new
43
50
  block.call(self) if block_given?
44
51
  end
52
+
53
+ #Attributes
45
54
  attr_accessor :engines, :query, :threads, :depth_search
46
55
  attr_reader :maxhits
56
+
57
+ def self.create(query=nil, &block)
58
+ self.new :query => query do |search|
59
+ block.call(search) if block_given?
60
+ end
61
+ end
47
62
 
48
63
  def search(query=nil)
49
64
  @engines.each do |n,e|
@@ -53,7 +68,7 @@ class ESearchy
53
68
  LOG.puts "+--- Finishing Search for #{n} ---+\n"
54
69
  end
55
70
  end
56
-
71
+ # retrieve emails
57
72
  def emails
58
73
  emails = []
59
74
  @engines.each do |n,e|
@@ -62,12 +77,30 @@ class ESearchy
62
77
  emails
63
78
  end
64
79
 
80
+ def people
81
+ people = []
82
+ [:LinkedIn, :GoogleProfiles].each do |e|
83
+ people.concat(@engines[e].people) if @engines[e]
84
+ end
85
+ people.uniq!
86
+ end
87
+
88
+ ## Filter methods ##
65
89
  def clean(&block)
66
90
  emails.each do |e|
67
91
  e.delete_if block.call
68
92
  end
69
93
  end
70
94
 
95
+ def filter(regex)
96
+ emails.each.select { |email| email =~ regex }
97
+ end
98
+
99
+ def filter_by_score(score)
100
+ emails.each.select { |email| score >= calculate_score(emails) }
101
+ end
102
+
103
+ ## Option methods ##
71
104
  def maxhits=(value)
72
105
  @engines.each do |n,e|
73
106
  e.maxhits = value
@@ -82,27 +115,39 @@ class ESearchy
82
115
  @engines[:Bing].appid = value
83
116
  end
84
117
 
85
- def linkedin_credentials(*args)
118
+ def linkedin_credentials
119
+ return @engines[:LinkedIn].username, @engines[:LinkedIn].password
120
+ end
121
+
122
+ def linkedin_credentials=(*args)
86
123
  if args.size == 2
87
124
  @engines[:LinkedIn].username = args[0]
88
125
  @engines[:LinkedIn].password = args[1]
89
126
  return true
90
- elsif args.size ==1
127
+ elsif args.size == 1
91
128
  @engines[:LinkedIn].username = args[0][0]
92
129
  @engines[:LinkedIn].password = args[0][1]
93
130
  return true
94
131
  end
95
132
  false
96
133
  end
97
- alias_method :linkedin_credentials=, :linkedin_credentials
98
134
 
99
- def company_name(company)
100
- @engines[:LinkedIn].company_name = company
135
+ def company_name
136
+ (@engines[:LinkedIn] ||
137
+ @engines[:GoogleProfiles] ||
138
+ @engines[:Naymz]).company_name || nil
139
+ end
140
+
141
+ def company_name=(company)
142
+ [:LinkedIn, :GoogleProfiles, :Naymz].each do |e|
143
+ @engines[e].company_name = company if @engines[e]
144
+ end
145
+
101
146
  end
102
- alias_method :company_name=, :company_name
103
147
 
104
148
  def search_engine(key, value)
105
- if [:Google, :Bing, :Yahoo, :PGP, :LinkedIn, :GoogleGroups].include?(key)
149
+ if [:Google, :Bing, :Yahoo, :PGP, :LinkedIn,
150
+ :GoogleGroups, :AltaVisa, :Usenet, :GoogleProfiles, Naymz].include?(key)
106
151
  if value == true
107
152
  unless @engines[key]
108
153
  @engines[key] = instance_eval "#{key}.new(@maxhits)"
@@ -115,31 +160,52 @@ class ESearchy
115
160
  end
116
161
  end
117
162
 
118
- %w{Google Bing Yahoo PGP LinkedIn GoogleGroups}.each do |engine|
163
+ %w{Google Bing Yahoo PGP LinkedIn GoogleGroups
164
+ Altavista Usenet GoogleProfiles Naymz}.each do |engine|
119
165
  class_eval "
120
166
  def search_#{engine}=(value)
121
167
  search_engine :#{engine}, value
122
168
  end"
123
169
  end
124
-
125
- def save_to_file(file)
170
+ ## Saving methods
171
+ def save_to_file(file, list=nil)
126
172
  open(file,"a") do |f|
127
- emails.each { |e| f << e + "\n" }
173
+ list ? list.each { |e| f << e + "\n" } : emails.each { |e| f << e + "\n" }
128
174
  end
129
175
  end
130
176
 
131
- def filter(regex)
132
- emails.each.select { |email| email =~ regex }
177
+ def save_to_sqlite(file)
178
+ # TODO save to sqlite3
179
+ # table esearchy with fields (id, Domain, email, score)
133
180
  end
134
181
 
135
- def self.create(query=nil, &block)
136
- self.new :query => query do |search|
137
- block.call(search) if block_given?
138
- end
182
+ ## checking methods ##
183
+
184
+ def verify_email!(arg = emails)
185
+ # TODO
186
+ # Connect to mail server if possible verify else
187
+ # return 0 for false 2 for true or 1 for error.
188
+ # VRFY & EXPN & 'RCPT TO:
189
+ return false
190
+ end
191
+
192
+ def verify_domain!(e)
193
+ Resolv::DNS.open.getresources(e.split('@')[-1],Resolv::DNS::Resource::IN::MX) > 0 ? true : false
139
194
  end
140
195
 
141
196
  private
142
197
 
198
+ def eng(arr)
199
+ hsh = {}; arr.each {|e| hsh[e] = instance_eval "#{e}"}; hsh
200
+ end
201
+
202
+ def calculate_score(email)
203
+ score = 0.0
204
+ score = score + 0.2 if email =~ /#{@query}/
205
+ score = score + 0.3 if verify_domain!(email)
206
+ score = 1.0 if verify_email!(email)
207
+ end
208
+
143
209
  def depth_search?
144
210
  @depth_search
145
211
  end
@@ -1,6 +1,6 @@
1
1
  %w{rubygems cgi net/http}.each { |lib| require lib }
2
- local_path = "#{File.dirname(__FILE__)}/"
3
- %w{searchy keys useragent}.each {|lib| require local_path + lib}
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
+ %w{searchy useragent}.each {|lib| require local_path + lib}
4
4
 
5
5
  class GoogleGroups
6
6
  include Searchy
@@ -13,6 +13,7 @@ class GoogleGroups
13
13
  @r_pdfs = Queue.new
14
14
  @r_txts = Queue.new
15
15
  @r_officexs = Queue.new
16
+ @emails = []
16
17
  @lock = Mutex.new
17
18
  @threads = []
18
19
  end
@@ -26,19 +27,19 @@ class GoogleGroups
26
27
  request = Net::HTTP::Get.new( "/groups/search?&safe=off&num=100" +
27
28
  "&q=" + CGI.escape(query) +
28
29
  "&btnG=Search&start=#{@start}",
29
- {'Cookie' => UserAgent::fetch})
30
+ {'User-Agent' => UserAgent::fetch})
30
31
  response = http.request(request)
31
32
  case response
32
33
  when Net::HTTPSuccess, Net::HTTPRedirection
33
34
  parse(response.body)
34
35
  @start = @start + 100
35
36
  if @totalhits > @start
36
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
37
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
37
38
  search_emails(response.body)
38
39
  sleep(4)
39
40
  search(query)
40
41
  else
41
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
42
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
42
43
  search_emails(response.body)
43
44
  end
44
45
  else
@@ -48,6 +49,7 @@ class GoogleGroups
48
49
  rescue Net::HTTPFatalError
49
50
  ESearchy::LOG.puts "Error: Something went wrong with the HTTP request"
50
51
  end
52
+ @start = 0
51
53
  end
52
54
 
53
55
  def parse(html)
@@ -1,5 +1,5 @@
1
1
  %w{cgi net/http}.each { |lib| require lib }
2
- local_path = "#{File.dirname(__FILE__)}/"
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
3
  %w{searchy useragent}.each {|lib| require local_path + lib}
4
4
 
5
5
  class PGP
@@ -10,6 +10,7 @@ class PGP
10
10
  @emails = []
11
11
  @lock = Mutex.new
12
12
  end
13
+ attr_accessor :emails
13
14
 
14
15
  def search(query)
15
16
  @query = query
@@ -17,7 +18,7 @@ class PGP
17
18
  begin
18
19
  http.start do |http|
19
20
  request = Net::HTTP::Get.new( "/pks/lookup?search=#{@query}",
20
- {'Cookie' => UserAgent::fetch})
21
+ {'User-Agent' => UserAgent::fetch})
21
22
  response = http.request(request)
22
23
  case response
23
24
  when Net::HTTPSuccess, Net::HTTPRedirection
@@ -31,13 +32,4 @@ class PGP
31
32
  ESearchy::LOG.puts "Error: Something went wrong with the HTTP request"
32
33
  end
33
34
  end
34
-
35
- def emails
36
- @totalhits == 0 ? emails : emails[0..@totalhits]
37
- end
38
-
39
- def emails=(value)
40
- emails = value
41
- end
42
-
43
35
  end
@@ -0,0 +1,35 @@
1
+ %w{cgi net/http}.each { |lib| require lib }
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
+ %w{searchy useragent}.each {|lib| require local_path + lib}
4
+
5
+ class Usenet
6
+ include Searchy
7
+
8
+ def initialize(maxhits=0)
9
+ @totalhits = maxhits
10
+ @emails = []
11
+ @lock = Mutex.new
12
+ end
13
+ attr_accessor :emails
14
+
15
+ def search(query)
16
+ @query = query
17
+ http = Net::HTTP.new("usenet-addresses.mit.edu",80)
18
+ begin
19
+ http.start do |http|
20
+ request = Net::HTTP::Get.new( "/cgi-bin/udb?T=#{@query}&G=&S=&N=&O=&M=500",
21
+ {'User-Agent' => UserAgent::fetch})
22
+ response = http.request(request)
23
+ case response
24
+ when Net::HTTPSuccess, Net::HTTPRedirection
25
+ ESearchy::LOG.puts "Searching #{self.class}"
26
+ search_emails(response.body)
27
+ else
28
+ return response.error!
29
+ end
30
+ end
31
+ rescue Net::HTTPFatalError
32
+ ESearchy::LOG.puts "Error: Something went wrong with the HTTP request"
33
+ end
34
+ end
35
+ end
@@ -0,0 +1,71 @@
1
+ %w{rubygems cgi net/http}.each { |lib| require lib }
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
+ %w{searchy useragent}.each {|lib| require local_path + lib}
4
+
5
+ class Altavista
6
+ include Searchy
7
+
8
+ def initialize(maxhits = 0, start = 0)
9
+ @start = start
10
+ @totalhits = maxhits
11
+ @emails = []
12
+ @r_urls = Queue.new
13
+ @r_docs = Queue.new
14
+ @r_pdfs = Queue.new
15
+ @r_txts = Queue.new
16
+ @r_officexs = Queue.new
17
+ @lock = Mutex.new
18
+ @threads = []
19
+ end
20
+ attr_accessor :emails
21
+
22
+ def search(query)
23
+ @query = query
24
+ http = Net::HTTP.new("www.altavista.com",80)
25
+ begin
26
+ http.start do |http|
27
+ request = Net::HTTP::Get.new( "/web/results?itag=ody&kgs=0&kls=0&nbq=50" +
28
+ "&q=" + CGI.escape(query) +
29
+ "&stq=#{@start}",
30
+ {'User-Agent' => UserAgent::fetch})
31
+ response = http.request(request)
32
+ case response
33
+ when Net::HTTPSuccess, Net::HTTPRedirection
34
+ parse(response.body)
35
+ @start = @start + 100
36
+ if @totalhits > @start
37
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-50} to #{@start}"
38
+ search_emails(response.body.gsub(/<b>|<\/b>/,""))
39
+ sleep(4)
40
+ search(query)
41
+ else
42
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-50} to #{@start}"
43
+ search_emails(response.body.gsub(/<b>|<\/b>/,""))
44
+ end
45
+ else
46
+ return response.error!
47
+ end
48
+ end
49
+ rescue Net::HTTPFatalError
50
+ ESearchy::LOG.puts "Error: Something went wrong with the HTTP request"
51
+ end
52
+ end
53
+
54
+ def parse(html)
55
+ @totalhits= html.scan(/AltaVista found (.*) results<\/A>/)[0][0].gsub(',','').to_i if @totalhits == 0
56
+ html.scan(/<a class='res' href='([a-zA-Z0-9:\/\/.&?%=\-_+]*)'>/).each do |result|
57
+ case result[0]
58
+ when /.pdf$/i
59
+ @r_pdfs << CGI.unescape(result[0])
60
+ when /.docx$|.xlsx$|.pptx$|.odt$|.odp$|.ods$|.odb$/i
61
+ @r_officexs << CGI.unescape(result[0])
62
+ when /.doc$/i
63
+ @r_docs << CGI.unescape(result[0])
64
+ when /.txt$|.rtf$|ans$/i
65
+ @r_txts << CGI.unescape(result[0])
66
+ else
67
+ @r_urls << CGI.unescape(result[0])
68
+ end
69
+ end
70
+ end
71
+ end
@@ -1,6 +1,6 @@
1
1
  %w{rubygems json cgi net/http}.each { |lib| require lib }
2
- local_path = "#{File.dirname(__FILE__)}/"
3
- %w{searchy keys useragent}.each {|lib| require local_path + lib}
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
+ %w{searchy useragent}.each {|lib| require local_path + lib}
4
4
 
5
5
  class Bing
6
6
  include Searchy
@@ -28,19 +28,19 @@ class Bing
28
28
  request = Net::HTTP::Get.new("/json.aspx" + "?Appid="+ @appid +
29
29
  "&query=" + CGI.escape(query) +
30
30
  "&sources=web&web.count=50&web.offset=#{@start}",
31
- {'Cookie' => UserAgent::fetch})
31
+ {'User-Agent' => UserAgent::fetch})
32
32
  response = http.request(request)
33
33
  case response
34
34
  when Net::HTTPSuccess, Net::HTTPRedirection
35
35
  parse(response.body)
36
36
  @start = @start + 50
37
37
  if @totalhits > @start
38
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
38
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-50} to #{@start}"
39
39
  search_emails(response.body)
40
40
  sleep(4)
41
41
  search(query)
42
42
  else
43
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
43
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-50} to #{@start}"
44
44
  search_emails(response.body)
45
45
  end
46
46
  else
@@ -1,6 +1,6 @@
1
1
  %w{rubygems cgi net/http}.each { |lib| require lib }
2
- local_path = "#{File.dirname(__FILE__)}/"
3
- %w{searchy keys useragent}.each {|lib| require local_path + lib}
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
+ %w{searchy useragent}.each {|lib| require local_path + lib}
4
4
 
5
5
  class Google
6
6
  include Searchy
@@ -27,20 +27,20 @@ class Google
27
27
  request = Net::HTTP::Get.new( "/cse?&safe=off&num=100&site=" +
28
28
  "&q=" + CGI.escape(query) +
29
29
  "&btnG=Search&start=#{@start}",
30
- {'Cookie' => UserAgent::fetch})
30
+ {'User-Agent' => UserAgent::fetch})
31
31
  response = http.request(request)
32
32
  case response
33
33
  when Net::HTTPSuccess, Net::HTTPRedirection
34
34
  parse(response.body)
35
35
  @start = @start + 100
36
36
  if @totalhits > @start
37
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
38
- search_emails(response.body)
37
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
38
+ search_emails(response.body.gsub(/<em>|<\/em>/,""))
39
39
  sleep(4)
40
40
  search(query)
41
41
  else
42
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
43
- search_emails(response.body)
42
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
43
+ search_emails(response.body.gsub(/<em>|<\/em>/,""))
44
44
  end
45
45
  else
46
46
  return response.error!
@@ -1,6 +1,6 @@
1
1
  %w{rubygems json cgi net/http}.each { |lib| require lib }
2
- local_path = "#{File.dirname(__FILE__)}/"
3
- %w{searchy keys useragent}.each {|lib| require local_path + lib}
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
+ %w{searchy useragent}.each {|lib| require local_path + lib}
4
4
 
5
5
  class Yahoo
6
6
  include Searchy
@@ -28,19 +28,19 @@ class Yahoo
28
28
  request = Net::HTTP::Get.new("/ysearch/web/v1/" + CGI.escape(query) +
29
29
  "?appid="+ @appid +
30
30
  "&format=json&count=50"+
31
- "&start=#{@start}", {'Cookie' => UserAgent::fetch} )
31
+ "&start=#{@start}", {'User-Agent' => UserAgent::fetch} )
32
32
  response = http.request(request)
33
33
  case response
34
34
  when Net::HTTPSuccess, Net::HTTPRedirection
35
35
  parse(response.body)
36
36
  @start = @start + 50
37
37
  if @totalhits > @start
38
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
38
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-50} to #{@start}"
39
39
  search_emails(response.body)
40
40
  sleep(4)
41
41
  search(@query)
42
42
  else
43
- ESearchy::LOG.puts "Searching in URL: #{self.class} up to point #{@start}"
43
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-50} to #{@start}"
44
44
  search_emails(response.body)
45
45
  end
46
46
  else
@@ -50,6 +50,7 @@ class Yahoo
50
50
  rescue Net::HTTPFatalError
51
51
  ESearchy::LOG.puts "Error: Something went wrong with the HTTP request"
52
52
  end
53
+ @start = 0
53
54
  end
54
55
 
55
56
  def parse(json)
@@ -0,0 +1,96 @@
1
+ #
2
+ # Big Thanks go to DigiNinja at digininja.org for telling me about this Google Hack.
3
+ #
4
+ %w{rubygems cgi net/http}.each { |lib| require lib }
5
+ local_path = "#{File.dirname(__FILE__)}/../"
6
+ %w{searchy useragent}.each {|lib| require local_path + lib}
7
+
8
+ class GoogleProfiles
9
+ include Searchy
10
+
11
+ def initialize(maxhits = 0, start = 0)
12
+ @start = start
13
+ @totalhits = maxhits
14
+ @emails = []
15
+ @people = []
16
+ @company_name = nil
17
+ @r_urls = Queue.new
18
+ @r_docs = Queue.new
19
+ @r_pdfs = Queue.new
20
+ @r_txts = Queue.new
21
+ @r_officexs = Queue.new
22
+ @lock = Mutex.new
23
+ @threads = []
24
+ end
25
+ attr_accessor :emails, :company_name, :people
26
+
27
+ def search(query)
28
+ @query = query
29
+ http = Net::HTTP.new("www.google.com",80)
30
+ begin
31
+ http.start do |http|
32
+ request = Net::HTTP::Get.new( "/cse?q=site:www.google.com+intitle:%22Google+" +
33
+ "Profile%22+%22Companies+I%27ve+worked+for%22+%22at+" +
34
+ CGI.escape(@company_name) +
35
+ "%22&hl=en&cof=&num=100&filter=0" +
36
+ "&safe=off&start=#{@start}",
37
+ {'User-Agent' => UserAgent::fetch})
38
+ response = http.request(request)
39
+ case response
40
+ when Net::HTTPSuccess, Net::HTTPRedirection
41
+ parse(response.body)
42
+ @start = @start + 100
43
+ if @totalhits > @start
44
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
45
+ parse(response.body)
46
+ search_emails(response.body.gsub(/<em>|<\/em>/,""))
47
+ sleep(4)
48
+ search(query)
49
+ else
50
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
51
+ parse(response.body)
52
+ search_emails(response.body.gsub(/<em>|<\/em>/,""))
53
+ end
54
+ else
55
+ return response.error!
56
+ end
57
+ end
58
+ rescue Net::HTTPFatalError
59
+ ESearchy::LOG.puts "Error: Something went wrong with the HTTP request"
60
+ end
61
+ @start = 0
62
+ end
63
+
64
+ def search_person(name,last)
65
+ email = []
66
+ # Search Yahoo
67
+ y = Yahoo.new(50)
68
+ y.search("first:\"#{name}\" last:\"#{last}\"")
69
+ emails.concat(y.emails).uniq!
70
+ # Search Google
71
+ #g = Google.new(50)
72
+ #g.search("#{name} #{last}")
73
+ #emails.concat(g.emails).uniq!
74
+ return emails
75
+ end
76
+
77
+ def enquire_person(profile)
78
+ # TO DO: parse profile to obtain more information
79
+ end
80
+
81
+ def parse(html)
82
+ @totalhits= html.scan(/<\/b> of[ about | ]\
83
+ <b>(.*)<\/b> from/)[0][0].gsub(",","").to_i if @totalhits == 0
84
+ html.scan(/<h2 class=r><a href="([0-9A-Z\
85
+ a-z:\\\/?&=@+%.;"'()_-]+)" class=l>([\w\s]*) -/).each do |profile|
86
+ @domain = @query.match(/@/) ? @query : ("@" + @query)
87
+ name,last = profile[1].split(" ")
88
+ @people << [name,last]
89
+ @emails << "#{name.split(' ')[0]}.#{last.split(' ')[0]}#{@domain}"
90
+ @emails << "#{name[0,1]}#{last.split(' ')[0]}#{@domain}"
91
+ #@emails.concat(fix(search_person(name,last)))
92
+ @emails.uniq!
93
+ print_emails(@emails)
94
+ end
95
+ end
96
+ end
@@ -1,6 +1,6 @@
1
1
  %w{rubygems cgi net/http net/https}.each { |lib| require lib }
2
- local_path = "#{File.dirname(__FILE__)}/"
3
- %w{searchy yahoo google useragent}.each {|lib| require local_path + lib}
2
+ local_path = "#{File.dirname(__FILE__)}/../"
3
+ %w{searchy useragent}.each {|lib| require local_path + lib}
4
4
 
5
5
  class LinkedIn
6
6
  include Searchy
@@ -0,0 +1,96 @@
1
+ # http://www.google.com/cse?q=site:naymz.com%20%2B%20%22@%20Boeing%22
2
+ %w{rubygems cgi net/http}.each { |lib| require lib }
3
+ local_path = "#{File.dirname(__FILE__)}/../"
4
+ %w{searchy useragent}.each {|lib| require local_path + lib}
5
+
6
+ class Naymz
7
+ include Searchy
8
+
9
+ def initialize(maxhits = 0, start = 0)
10
+ @start = start
11
+ @totalhits = maxhits
12
+ @emails = []
13
+ @people = []
14
+ @company_name = nil
15
+ @r_urls = Queue.new
16
+ @r_docs = Queue.new
17
+ @r_pdfs = Queue.new
18
+ @r_txts = Queue.new
19
+ @r_officexs = Queue.new
20
+ @lock = Mutex.new
21
+ @threads = []
22
+ end
23
+ attr_accessor :emails, :company_name, :people
24
+
25
+ def search(query)
26
+ @query = query
27
+ http = Net::HTTP.new("www.google.com",80)
28
+ begin
29
+ http.start do |http|
30
+ request = Net::HTTP::Get.new( "/cse?q=site:naymz.com%20%2B%20%22@%20" +
31
+ CGI.escape(@company_name) +
32
+ "%22&hl=en&cof=&num=100&filter=0" +
33
+ "&safe=off&start=#{@start}",
34
+ {'User-Agent' => UserAgent::fetch})
35
+ response = http.request(request)
36
+ case response
37
+ when Net::HTTPSuccess, Net::HTTPRedirection
38
+ parse(response.body)
39
+ @start = @start + 100
40
+ if @totalhits > @start
41
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
42
+ parse(response.body)
43
+ search_emails(response.body.gsub(/<em>|<\/em>/,""))
44
+ sleep(4)
45
+ search(query)
46
+ else
47
+ ESearchy::LOG.puts "Searching #{self.class} from #{@start-100} to #{@start}"
48
+ parse(response.body)
49
+ search_emails(response.body.gsub(/<em>|<\/em>/,""))
50
+ end
51
+ else
52
+ return response.error!
53
+ end
54
+ end
55
+ rescue Net::HTTPFatalError
56
+ ESearchy::LOG.puts "Error: Something went wrong with the HTTP request"
57
+ end
58
+ @start = 0
59
+ end
60
+
61
+ def search_person(name,last)
62
+ email = []
63
+ # Search Yahoo
64
+ y = Yahoo.new(50)
65
+ y.search("first:\"#{name}\" last:\"#{last}\"")
66
+ emails.concat(y.emails).uniq!
67
+ # Search Google
68
+ #g = Google.new(50)
69
+ #g.search("#{name} #{last}")
70
+ #emails.concat(g.emails).uniq!
71
+ return emails
72
+ end
73
+
74
+ def enquire_person(profile)
75
+ # TO DO: parse profile to obtain more information
76
+ end
77
+
78
+ def parse(html)
79
+ @totalhits= html.scan(/<\/b> of[ about | ]\
80
+ <b>(.*)<\/b> from/)[0][0].gsub(",","").to_i if @totalhits == 0
81
+ html.scan(/<h2 class=r><a href="([0-9A-Z\
82
+ a-z:\\\/?&=@+%.;"'()_-]+)" class=l>([\w\s]*) -/).each do |profile|
83
+ @domain = @query.match(/@/) ? @query : ("@" + @query)
84
+ person = profile[1].split(" ").delete_if do
85
+ |x| x =~ /mr.|mr|ms.|ms|phd.|dr.|dr|phd|phd./i
86
+ end
87
+ name,last = person.size > 2 ? [person[0],person[-1]] : person
88
+ @people << person
89
+ @emails << "#{name.split(' ')[0]}.#{last.split(' ')[0]}#{@domain}"
90
+ @emails << "#{name[0,1]}#{last.split(' ')[0]}#{@domain}"
91
+ #@emails.concat(fix(search_person(name,last)))
92
+ @emails.uniq!
93
+ print_emails(@emails)
94
+ end
95
+ end
96
+ end
@@ -10,32 +10,15 @@ if RUBY_PLATFORM =~ /mingw|mswin/
10
10
  end
11
11
 
12
12
  module Searchy
13
- case RUBY_PLATFORM
14
- when /mingw|mswin/
15
- TEMP = "C:\\WINDOWS\\Temp\\"
16
- else
17
- TEMP = "/tmp/"
18
- end
19
13
 
20
14
  def search_emails(string)
21
- string = string.gsub("<em>","") if self.class == Google
22
- # OLD regex list = string.scan(/[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*@\
23
- # (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/)
24
- # list = string.scan(/[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*_at_\
25
- #(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\
26
- #[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*\sat\s(?:[a-z0-9](?:[a-z0-9-]\
27
- #*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[a-z0-9!#$&'*+=?^_`{|}~-]+\
28
- #(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\
29
- #[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*\s@\s(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+\
30
- #[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\sdot\s[a-z0-9!#$&'*+=?\^_`\
31
- #{|}~-]+)*\sat\s(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\sdot\s)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/)
32
- list = string.scan(/[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*_at_\
15
+ list = string.scan(/[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?^_`{|}~-]+)*_at_\
33
16
  (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z](?:[a-z-]*[a-z])?|\
34
- [a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*\sat\s(?:[a-z0-9](?:[a-z0-9-]\
17
+ [a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?^_`{|}~-]+)*\sat\s(?:[a-z0-9](?:[a-z0-9-]\
35
18
  *[a-z0-9])?\.)+[a-z](?:[a-z-]*[a-z])?|[a-z0-9!#$&'*+=?^_`{|}~-]+\
36
- (?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z](?:[a-z-]*[a-z])?|\
37
- [a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?\^_`{|}~-]+)*\s@\s(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+\
38
- [a-z](?:[a-z-]*[a-z])?|[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\sdot\s[a-z0-9!#$&'*+=?\^_`\
19
+ (?:\.[a-z0-9!#$&'*+=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z](?:[a-z-]*[a-z])?|\
20
+ [a-z0-9!#$&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$&'*+=?^_`{|}~-]+)*\s@\s(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+\
21
+ [a-z](?:[a-z-]*[a-z])?|[a-z0-9!#$&'*+=?^_`{|}~-]+(?:\sdot\s[a-z0-9!#$&'*+=?^_`\
39
22
  {|}~-]+)*\sat\s(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\sdot\s)+[a-z](?:[a-z-]*[a-z])??/)
40
23
  @lock.synchronize do
41
24
  print_emails(list)
@@ -55,7 +38,7 @@ module Searchy
55
38
  response = http.request(request)
56
39
  case response
57
40
  when Net::HTTPSuccess, Net::HTTPRedirection
58
- name = Searchy::TEMP + "#{hash_url(web.to_s)}.pdf"
41
+ name = ESearchy::TEMP + "#{hash_url(web.to_s)}.pdf"
59
42
  open(name, "wb") do |file|
60
43
  file.write(response.body)
61
44
  end
@@ -3,11 +3,6 @@ class UserAgent
3
3
  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Acoo Browser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
4
4
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
5
5
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Acoo Browser; InfoPath.2; .NET CLR 2.0.50727; Alexa Toolbar)",
6
- "amaya/9.52 libwww/5.4.0",
7
- "amaya/11.1 libwww/5.4.0",
8
- "Amiga-AWeb/3.5.07 beta",
9
- "AmigaVoyager/3.4.4 (MorphOS/PPC native)",
10
- "AmigaVoyager/2.95 (compatible; MC680x0; AmigaOS)",
11
6
  "Mozilla/4.0 (compatible; MSIE 7.0; AOL 7.0; Windows NT 5.1; FunWebProducts)",
12
7
  "Mozilla/4.0 (compatible; MSIE 6.0; AOL 8.0; Windows NT 5.1; SV1)",
13
8
  "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.0; Windows NT 5.1; .NET CLR 1.1.4322; Zango 10.1.181.0)",
@@ -36,11 +31,6 @@ class UserAgent
36
31
  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Crazy Browser 1.0.5; .NET CLR 1.1.4322; InfoPath.1)",
37
32
  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)",
38
33
  "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Demeter/1.0.9 Safari/125",
39
- "Dillo/0.8.5",
40
- "Dillo/2.0",
41
- "Doris/1.15 [en] (Symbian)",
42
- "ELinks/0.13.GIT (textmode; Linux 2.6.22-2-686 i686; 148x68-3)",
43
- "ELinks/0.9.3 (textmode; Linux 2.6.11 i686; 79x24)",
44
34
  "Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.12) Gecko/20080208 (Debian-1.8.1.12-2) Epiphany/2.20",
45
35
  "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041007 Epiphany/1.4.7",
46
36
  "Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.5) Gecko/20031007 Firebird/0.7",
@@ -67,12 +57,10 @@ class UserAgent
67
57
  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; GreenBrowser)",
68
58
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; GreenBrowser)",
69
59
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; GreenBrowser)",
70
- "HotJava/1.1.2 FCS",
71
60
  "Mozilla/3.0 (x86 [cs] Windows NT 5.1; Sun)",
72
61
  "Mozilla/5.1 (X11; U; Linux i686; en-US; rv:1.8.0.3) Gecko/20060425 SUSE/1.5.0.3-7 Hv3/alpha",
73
62
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SIMBAR={CFBFDAEA-F21E-4D6E-A9B0-E100A69B860F}; Hydra Browser; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.04506.30)",
74
63
  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Hydra Browser; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)",
75
- "IBrowse/2.3 (AmigaOS 3.9)",
76
64
  "Mozilla/5.0 (compatible; IBrowse 3.0; AmigaOS4.0)",
77
65
  "Mozilla/4.5 (compatible; iCab 2.9.1; Macintosh; U; PPC)",
78
66
  "iCab/3.0.2 (Macintosh; U; PPC Mac OS X)",
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: FreedomCoder-esearchy
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.19
4
+ version: "0.1"
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matias P. Brutti
@@ -33,14 +33,14 @@ dependencies:
33
33
  version: 1.1.6
34
34
  version:
35
35
  - !ruby/object:Gem::Dependency
36
- name: rubyzip
36
+ name: FreedomCoder-rubyzip
37
37
  type: :runtime
38
38
  version_requirement:
39
39
  version_requirements: !ruby/object:Gem::Requirement
40
40
  requirements:
41
41
  - - ">="
42
42
  - !ruby/object:Gem::Version
43
- version: 0.9.1
43
+ version: 0.9.3
44
44
  version:
45
45
  description:
46
46
  email: matiasbrutti@gmail.com
@@ -51,28 +51,35 @@ extensions: []
51
51
  extra_rdoc_files:
52
52
  - README.rdoc
53
53
  files:
54
- - esearchy.rb
55
54
  - bin
56
55
  - bin/esearchy
57
56
  - data
58
57
  - data/bing.key
59
58
  - data/yahoo.key
60
59
  - lib
61
- - lib/esearchy.rb
60
+ - esearchy.rb
62
61
  - lib/esearchy
63
- - lib/esearchy/bing.rb
64
- - lib/esearchy/google.rb
65
- - lib/esearchy/googlegroups.rb
62
+ - lib/esearchy.rb
63
+ - lib/esearchy/bugmenot.rb
66
64
  - lib/esearchy/keys.rb
67
- - lib/esearchy/linkedin.rb
65
+ - lib/esearchy/logger.rb
66
+ - lib/esearchy/OtherEngines
67
+ - lib/esearchy/OtherEngines/pgp.rb
68
+ - lib/esearchy/OtherEngines/googlegroups.rb
69
+ - lib/esearchy/OtherEngines/usenet.rb
68
70
  - lib/esearchy/pdf2txt.rb
69
- - lib/esearchy/pgp.rb
71
+ - lib/esearchy/SearchEngines/
72
+ - lib/esearchy/SearchEngines/bing.rb
73
+ - lib/esearchy/SearchEngines/google.rb
74
+ - lib/esearchy/SearchEngines/yahoo.rb
75
+ - lib/esearchy/SearchEngines/altavista.rb
70
76
  - lib/esearchy/searchy.rb
71
- - lib/esearchy/yahoo.rb
77
+ - lib/esearchy/SocialNetworks/
78
+ - lib/esearchy/SocialNetworks/googleprofiles.rb
79
+ - lib/esearchy/SocialNetworks/linkedin.rb
80
+ - lib/esearchy/SocialNetworks/naymz.rb
72
81
  - lib/esearchy/wcol.rb
73
82
  - lib/esearchy/useragent.rb
74
- - lib/esearchy/logger.rb
75
- - lib/esearchy/bugmenot.rb
76
83
  - README.rdoc
77
84
  has_rdoc: true
78
85
  homepage: http://freedomcoder.com.ar/esearchy