linkedin-scraper 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.rdoc ADDED
@@ -0,0 +1,104 @@
1
+ = Linkedin-Scraper {<img src="http://travis-ci.org/jaimeiniesta/metainspector.png" />}[http://travis-ci.org/jaimeiniesta/metainspector]
2
+
3
+ Linkedin-scraper is a gem for scraping linkedin public profiles. You give it an URL, and it lets you easily get its title,name,country,area,current_companies .
4
+
5
+ = Installation
6
+
7
+ Install the gem from RubyGems:
8
+
9
+ gem install linkedin-scraper
10
+
11
+ This gem is tested on Ruby versions 1.8.7, 1.9.2 and 1.9.3.
12
+
13
+ = Usage
14
+
15
+ Initialize a scraper instance for an URL, like this:
16
+
17
+ profile = Linkedin::Scraper.get_profile('http://in.linkedin.com/pub/yatish-mehta/22/460/a86')
18
+
19
+ Then you can see the scraped data like this:
20
+
21
+
22
+ profile.first_name #the First name of the contact
23
+
24
+ profile.last_name #the last name of the contact
25
+
26
+ profile.title #the linkedin job title
27
+
28
+ profile.location #the location of the contact
29
+
30
+ profile.country #the country of the contact
31
+
32
+ profile.industry #the domain for which the contact belongs
33
+
34
+ profile.past_companies
35
+ #Array of hash containing its past job companies and job profile
36
+ #Example
37
+ # [
38
+ # [0] {
39
+ # :past_title => "Intern",
40
+ # :past_company => "Sungard"
41
+ # },
42
+ # [1] {
43
+ # :past_title => "Software Developer",
44
+ # :past_company => "Microsoft"
45
+ # }
46
+ # ]
47
+ profile.current_companies
48
+ #Array of hash containing its current job companies and job profile
49
+ #Example
50
+ # [
51
+ # [0] {
52
+ # :current_title => "Intern",
53
+ # :current_company => "Sungard"
54
+ # },
55
+ # [1] {
56
+ # :current_title => "Software Developer",
57
+ # :current_company => "Microsoft"
58
+ # }
59
+ # ]
60
+
61
+
62
+ profile.linkedin_url #url of the profile
63
+
64
+ profile.recommended_visitors
65
+
66
+ = Examples
67
+
68
+ When a link is given, it scrapes the profile and gets the data
69
+
70
+ $ irb
71
+ >> require 'metainspector'
72
+ => true
73
+
74
+ >> page = MetaInspector.new('http://pagerankalert.com')
75
+ => #<MetaInspector:0x11330c0 @url="http://pagerankalert.com">
76
+
77
+ >> page.title
78
+ => "PageRankAlert.com :: Track your PageRank changes"
79
+
80
+ >> page.meta_description
81
+ => "Track your PageRank(TM) changes and receive alerts by email"
82
+
83
+ >> page.meta_keywords
84
+ => "pagerank, seo, optimization, google"
85
+
86
+ >> page.links.size
87
+ => 8
88
+
89
+ >> page.links[5]
90
+ => "http://pagerankalert.posterous.com"
91
+
92
+ >> page.document.class
93
+ => String
94
+
95
+ >> page.parsed_document.class
96
+ => Nokogiri::HTML::Document
97
+
98
+ = ZOMG Fork! Thank you!
99
+
100
+ You're welcome to fork this project and send pull requests. I want to thank specially:
101
+
102
+ = To Do
103
+ *
104
+ Copyright (c) 2009-2012 Yatish Mehta, released under the MIT license
@@ -4,7 +4,7 @@ require "mechanize"
4
4
  require "awesome_print"
5
5
 
6
6
  %w(client contact profile).each do |file|
7
- require File.join(File.dirname(__FILE__), 'linkedin', file)
7
+ require File.join(File.dirname(__FILE__), 'linkedin-scraper', file)
8
8
  end
9
9
 
10
10
 
@@ -0,0 +1,125 @@
1
+ # To change this template, choose Tools | Templates
2
+ # and open the template in the editor.
3
+
4
+
5
+ module Linkedin
6
+ class Client
7
+ USER_AGENTS = ["Windows IE 6", "Windows IE 7", "Windows Mozilla", "Mac Safari", "Mac FireFox", "Mac Mozilla", "Linux Mozilla", "Linux Firefox", "Linux Konqueror"]
8
+ attr_accessor :contacts ,:matched_tag,:probability
9
+
10
+ def initialize(first_name,last_name ,company,options={})
11
+ @first_name=first_name.downcase
12
+ @last_name=last_name.downcase
13
+ @company=company
14
+ @country=options[:country] || "us"
15
+ @search_linkedin_url="http://#{@country}.linkedin.com/pub/dir/#{@first_name}/#{@last_name}"
16
+ @contacts=[]
17
+ @links=[]
18
+ get_agent
19
+ end
20
+
21
+ def get_agent
22
+ @agent=Mechanize.new
23
+ @agent.user_agent_alias = USER_AGENTS.sample
24
+ @agent.max_history = 0
25
+ @agent
26
+ end
27
+
28
+ def get_contacts
29
+ begin
30
+ sleep(2+rand(4))
31
+ puts "===>Father:Scrapping linkedin url "+ @search_linkedin_url
32
+ @page=@agent.get @search_linkedin_url
33
+ @page.search(".vcard").each do |node|
34
+ @contacts<<Linkedin::Contact.new(node)
35
+ end
36
+ rescue Mechanize::ResponseCodeError=>e
37
+ puts "RESCUE"
38
+ end
39
+ return @contacts
40
+ end
41
+
42
+
43
+ #TODO need to refactor this function need seperate function of each case
44
+ def get_verified_contact
45
+ get_contacts
46
+ @contacts.each do |contact|
47
+ #check current company
48
+ contact.current_companies.each do |company|
49
+ if company[:current_company]
50
+ if company[:current_company].match(/#{@company}/i)
51
+ @matched_tag="CURRENT"
52
+ return contact
53
+ end
54
+ end
55
+ end if contact.current_companies
56
+
57
+ #title of profile
58
+ if contact.title.match(/#{@company}/i)
59
+ @matched_tag="CURRENT"
60
+ return contact
61
+ end
62
+
63
+ #check past companies
64
+ contact.past_companies.each do |company|
65
+ if company[:past_company]
66
+ if company[:past_company].match(/#{@company}/i)
67
+ @matched_tag="PAST"
68
+ return contact
69
+ end
70
+ end
71
+ end if contact.past_companies
72
+ #
73
+ #Going in to profile homepage and then checking
74
+ #
75
+ sleep(2+rand(4))
76
+ puts "===>Child:Scrapping linkedin url: "+ contact.linkedin_url
77
+ profile=contact.get_profile(get_agent.get(contact.linkedin_url),contact.linkedin_url)
78
+ #check current company
79
+ profile.current_companies.each do |company|
80
+ if company[:current_company]
81
+ if company[:current_company].match(/#{@company}/i)
82
+ @matched_tag="CURRENT"
83
+ return profile
84
+ end
85
+ end
86
+ end if profile.current_companies
87
+
88
+ #title of profile
89
+ if profile.title
90
+ if profile.title.match(/#{@company}/i)
91
+ @matched_tag="CURRENT"
92
+ return profile
93
+ end
94
+ end
95
+ #check past companies
96
+ profile.past_companies.each do |company|
97
+ if company[:past_company]
98
+ if company[:past_company].match(/#{@company}/i)
99
+ @matched_tag="PAST"
100
+ return profile
101
+ end
102
+ end
103
+ end if profile.past_companies
104
+ #check recommended visitors
105
+ if profile.recommended_visitors
106
+ cnt=0
107
+ profile.recommended_visitors.each do |visitor|
108
+ if visitor[:company]
109
+ if visitor[:company].match(/#{@company}/i)
110
+ cnt+=1
111
+ end
112
+ end
113
+ end
114
+ @probability=cnt/profile.recommended_visitors.length.to_f
115
+ @matched_tag="RECOMMENDED"
116
+ return profile if @probability>=0.5
117
+ end
118
+
119
+ end unless @contacts.empty?
120
+ return nil
121
+ end
122
+
123
+
124
+ end
125
+ end
@@ -0,0 +1,134 @@
1
+ # To change this template, choose Tools | Templates
2
+ # and open the template in the editor.
3
+ module Linkedin
4
+
5
+ class Contact
6
+ #the First name of the contact
7
+ attr_accessor :first_name
8
+ #the last name of the contact
9
+ attr_accessor :last_name
10
+ #the linkedin job title
11
+ attr_accessor :title
12
+ #the location of the contact
13
+ attr_accessor :location
14
+ #the country of the contact
15
+ attr_accessor :country
16
+ #the domain for which the contact belongs
17
+ attr_accessor :industry
18
+ #the entire profile of the contact
19
+ attr_accessor :profile
20
+
21
+ #Array of hash containing its past job companies and job profile
22
+ #Example
23
+ # [
24
+ # [0] {
25
+ # :past_title => "Intern",
26
+ # :past_company => "Sungard"
27
+ # },
28
+ # [1] {
29
+ # :past_title => "Software Developer",
30
+ # :past_company => "Microsoft"
31
+ # }
32
+ # ]
33
+
34
+ attr_accessor :past_companies
35
+ #Array of hash containing its current job companies and job profile
36
+ #Example
37
+ # [
38
+ # [0] {
39
+ # :current_title => "Intern",
40
+ # :current_company => "Sungard"
41
+ # },
42
+ # [1] {
43
+ # :current_title => "Software Developer",
44
+ # :current_company => "Microsoft"
45
+ # }
46
+ # ]
47
+ attr_accessor :current_companies
48
+
49
+ attr_accessor :linkedin_url
50
+
51
+ attr_accessor :profile
52
+
53
+ def initialize(node=[])
54
+ unless node.class==Array
55
+ @first_name=get_first_name(node)
56
+ @last_name=get_last_name(node)
57
+ @title=get_title(node)
58
+ @location=get_location(node)
59
+ @country=get_country(node)
60
+ @industry=get_industry(node)
61
+ @current_companies=get_current_companies node
62
+ @past_companies=get_past_companies node
63
+ @linkedin_url=get_linkedin_url node
64
+ end
65
+ end
66
+ #page is a Nokogiri::XML node of the profile page
67
+ #returns object of Linkedin::Profile
68
+ def get_profile page,url
69
+ @profile=Linkedin::Profile.new(page,url)
70
+ end
71
+
72
+ private
73
+
74
+ def get_first_name node
75
+ return node.at(".given-name").text.strip if node.search(".given-name").first
76
+ end
77
+
78
+ def get_last_name node
79
+ return node.at(".family-name").text.strip if node.search(".family-name").first
80
+ end
81
+
82
+ def get_title node
83
+ return node.at(".title").text.gsub(/\s+/, " ").strip if node.search(".title").first
84
+ end
85
+
86
+ def get_location node
87
+ return node.at(".location").text.split(",").first.strip if node.search(".location").first
88
+
89
+ end
90
+
91
+ def get_country node
92
+ return node.at(".location").text.split(",").last.strip if node.search(".location").first
93
+
94
+ end
95
+
96
+ def get_industry node
97
+ return node.at(".industry").text.strip if node.search(".industry").first
98
+ end
99
+
100
+ def get_linkedin_url node
101
+ node.at("h2/strong/a").attributes["href"]
102
+ end
103
+
104
+ def get_current_companies node
105
+ current_cs=[]
106
+ if node.search(".current-content").first
107
+ node.at(".current-content").text.split(",").each do |content|
108
+ title,company=content.split(" at ")
109
+ company=company.gsub(/\s+/, " ").strip if company
110
+ title=title.gsub(/\s+/, " ").strip if title
111
+ current_company={:current_company=>company,:current_title=> title}
112
+ current_cs<<current_company
113
+ end
114
+ return current_cs
115
+ end
116
+ end
117
+
118
+ def get_past_companies node
119
+ past_cs=[]
120
+ if node.search(".past-content").first
121
+ node.at(".past-content").text.split(",").each do |content|
122
+ title,company=content.split(" at ")
123
+ company=company.gsub(/\s+/, " ").strip if company
124
+ title=title.gsub(/\s+/, " ").strip if title
125
+ past_company={:past_company=>company,:past_title=> title }
126
+ past_cs<<past_company
127
+ end
128
+ return past_cs
129
+ end
130
+ end
131
+
132
+ end
133
+
134
+ end
@@ -0,0 +1,148 @@
1
+ # To change this template, choose Tools | Templates
2
+ # and open the template in the editor.
3
+ module Linkedin
4
+ class Profile
5
+ USER_AGENTS = ["Windows IE 6", "Windows IE 7", "Windows Mozilla", "Mac Safari", "Mac FireFox", "Mac Mozilla", "Linux Mozilla", "Linux Firefox", "Linux Konqueror"]
6
+ #the First name of the contact
7
+ attr_accessor :first_name
8
+ #the last name of the contact
9
+ attr_accessor :last_name
10
+ #the linkedin job title
11
+ attr_accessor :title
12
+ #the location of the contact
13
+ attr_accessor :location
14
+ #the country of the contact
15
+ attr_accessor :country
16
+ #the domain for which the contact belongs
17
+ attr_accessor :industry
18
+ #the entire profile of the contact
19
+ attr_accessor :profile
20
+
21
+ #Array of hash containing its past job companies and job profile
22
+ #Example
23
+ # [
24
+ # [0] {
25
+ # :past_title => "Intern",
26
+ # :past_company => "Sungard"
27
+ # },
28
+ # [1] {
29
+ # :past_title => "Software Developer",
30
+ # :past_company => "Microsoft"
31
+ # }
32
+ # ]
33
+
34
+ attr_accessor :past_companies
35
+ #Array of hash containing its current job companies and job profile
36
+ #Example
37
+ # [
38
+ # [0] {
39
+ # :current_title => "Intern",
40
+ # :current_company => "Sungard"
41
+ # },
42
+ # [1] {
43
+ # :current_title => "Software Developer",
44
+ # :current_company => "Microsoft"
45
+ # }
46
+ # ]
47
+ attr_accessor :current_companies
48
+ #url of the profile
49
+ attr_accessor :linkedin_url
50
+ #Array of hash containing its recommended visitors which come on the
51
+ attr_accessor :recommended_visitors
52
+
53
+ def initialize(page,url)
54
+ @first_name=get_first_name(page)
55
+ @last_name=get_last_name(page)
56
+ @title=get_title(page)
57
+ @location=get_location(page)
58
+ @country=get_country(page)
59
+ @industry=get_industry(page)
60
+ @current_companies=get_current_companies page
61
+ @past_companies=get_past_companies page
62
+ @recommended_visitors=get_recommended_visitors page
63
+ @linkedin_url=url
64
+ end
65
+ #returns:nil if it gives a 404 request
66
+ def get_profile url
67
+ begin
68
+ @agent=Mechanize.new
69
+ @agent.user_agent_alias = USER_AGENTS.sample
70
+ @agent.max_history = 0
71
+ page=@agent.get url
72
+ return Linkedin::Profile.new(page, url)
73
+ rescue=>e
74
+ puts e
75
+ end
76
+ end
77
+
78
+ private
79
+
80
+ def get_first_name page
81
+ return page.at(".given-name").text.strip if page.search(".given-name").first
82
+ end
83
+
84
+ def get_last_name page
85
+ return page.at(".family-name").text.strip if page.search(".family-name").first
86
+ end
87
+
88
+ def get_title page
89
+ return page.at(".headline-title").text.gsub(/\s+/, " ").strip if page.search(".headline-title").first
90
+ end
91
+
92
+ def get_location page
93
+ return page.at(".locality").text.split(",").first.strip if page.search(".locality").first
94
+ end
95
+
96
+ def get_country page
97
+ return page.at(".locality").text.split(",").last.strip if page.search(".locality").first
98
+ end
99
+
100
+ def get_industry page
101
+ return page.at(".industry").text.gsub(/\s+/, " ").strip if page.search(".industry").first
102
+ end
103
+
104
+ def get_past_companies page
105
+ past_cs=[]
106
+ if page.search(".past").first
107
+ page.search(".past").search("li").each do |past_company|
108
+ title,company=past_company.text.strip.split(" at ")
109
+ company=company.gsub(/\s+/, " ").strip if company
110
+ title=title.gsub(/\s+/, " ").strip if title
111
+ past_company={:past_company=>company,:past_title=> title}
112
+ past_cs<<past_company
113
+ end
114
+ return past_cs
115
+ end
116
+ end
117
+
118
+ def get_current_companies page
119
+ current_cs=[]
120
+ if page.search(".current").first
121
+ page.search(".current").search("li").each do |past_company|
122
+ title,company=past_company.text.strip.split(" at ")
123
+ company=company.gsub(/\s+/, " ").strip if company
124
+ title=title.gsub(/\s+/, " ").strip if title
125
+ current_company={:current_company=>company,:current_title=> title}
126
+ current_cs<<current_company
127
+ end
128
+ return current_cs
129
+ end
130
+ end
131
+
132
+ def get_recommended_visitors page
133
+ recommended_vs=[]
134
+ if page.search(".browsemap").first
135
+ page.at(".browsemap").at("ul").search("li").each do |visitor|
136
+ v={}
137
+ v[:link]=visitor.at('a').attributes["href"]
138
+ v[:name]=visitor.at('a').text
139
+ v[:title]=visitor.at('.headline').text.split(" at ").first
140
+ v[:company]=visitor.at('.headline').text.split(" at ").last
141
+ recommended_vs<<v
142
+ end
143
+ return recommended_vs
144
+ end
145
+
146
+ end
147
+ end
148
+ end
@@ -1,5 +1,5 @@
1
1
  module Linkedin
2
2
  module Scraper
3
- VERSION = "0.0.1"
3
+ VERSION = "0.0.2"
4
4
  end
5
5
  end
@@ -5,8 +5,8 @@ Gem::Specification.new do |gem|
5
5
  gem.authors = ["Yatish Mehta"]
6
6
  gem.email = ["yatishmehta27@gmail.com"]
7
7
  gem.description = %q{Scrapes the linkedin profile when a url is given }
8
- gem.summary = %q{Write a gem summary}
9
- gem.homepage = ""
8
+ gem.summary = %q{when a url of public linkedin profile page is given it scrapes the entire page and converts into a accessible object}
9
+ gem.homepage = "https://github.com/yatishmehta27/linkedin-scraper"
10
10
  gem.add_dependency(%q<httparty>, [">= 0"])
11
11
  gem.add_dependency(%q<mechanize>, [">= 0"])
12
12
  gem.add_dependency(%q<awesome_print>, [">= 0"])
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: linkedin-scraper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -14,7 +14,7 @@ default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: httparty
17
- requirement: &10580260 !ruby/object:Gem::Requirement
17
+ requirement: &21059780 !ruby/object:Gem::Requirement
18
18
  none: false
19
19
  requirements:
20
20
  - - ! '>='
@@ -22,10 +22,10 @@ dependencies:
22
22
  version: '0'
23
23
  type: :runtime
24
24
  prerelease: false
25
- version_requirements: *10580260
25
+ version_requirements: *21059780
26
26
  - !ruby/object:Gem::Dependency
27
27
  name: mechanize
28
- requirement: &10577140 !ruby/object:Gem::Requirement
28
+ requirement: &21059160 !ruby/object:Gem::Requirement
29
29
  none: false
30
30
  requirements:
31
31
  - - ! '>='
@@ -33,10 +33,10 @@ dependencies:
33
33
  version: '0'
34
34
  type: :runtime
35
35
  prerelease: false
36
- version_requirements: *10577140
36
+ version_requirements: *21059160
37
37
  - !ruby/object:Gem::Dependency
38
38
  name: awesome_print
39
- requirement: &10576640 !ruby/object:Gem::Requirement
39
+ requirement: &21058580 !ruby/object:Gem::Requirement
40
40
  none: false
41
41
  requirements:
42
42
  - - ! '>='
@@ -44,7 +44,7 @@ dependencies:
44
44
  version: '0'
45
45
  type: :runtime
46
46
  prerelease: false
47
- version_requirements: *10576640
47
+ version_requirements: *21058580
48
48
  description: ! 'Scrapes the linkedin profile when a url is given '
49
49
  email:
50
50
  - yatishmehta27@gmail.com
@@ -55,13 +55,16 @@ files:
55
55
  - .gitignore
56
56
  - Gemfile
57
57
  - LICENSE
58
- - README.md
58
+ - README.rdoc
59
59
  - Rakefile
60
60
  - lib/linkedin-scraper.rb
61
+ - lib/linkedin-scraper/client.rb
62
+ - lib/linkedin-scraper/contact.rb
63
+ - lib/linkedin-scraper/profile.rb
61
64
  - lib/linkedin-scraper/version.rb
62
65
  - linkedin-scraper.gemspec
63
66
  has_rdoc: true
64
- homepage: ''
67
+ homepage: https://github.com/yatishmehta27/linkedin-scraper
65
68
  licenses: []
66
69
  post_install_message:
67
70
  rdoc_options: []
@@ -84,5 +87,6 @@ rubyforge_project:
84
87
  rubygems_version: 1.6.2
85
88
  signing_key:
86
89
  specification_version: 3
87
- summary: Write a gem summary
90
+ summary: when a url of public linkedin profile page is given it scrapes the entire
91
+ page and converts into a accessible object
88
92
  test_files: []
data/README.md DELETED
@@ -1,29 +0,0 @@
1
- # Linkedin::Scraper
2
-
3
- TODO: Write a gem description
4
-
5
- ## Installation
6
-
7
- Add this line to your application's Gemfile:
8
-
9
- gem 'linkedin-scraper'
10
-
11
- And then execute:
12
-
13
- $ bundle
14
-
15
- Or install it yourself as:
16
-
17
- $ gem install linkedin-scraper
18
-
19
- ## Usage
20
-
21
- TODO: Write usage instructions here
22
-
23
- ## Contributing
24
-
25
- 1. Fork it
26
- 2. Create your feature branch (`git checkout -b my-new-feature`)
27
- 3. Commit your changes (`git commit -am 'Added some feature'`)
28
- 4. Push to the branch (`git push origin my-new-feature`)
29
- 5. Create new Pull Request