RubyGems - linkedin-scraper - Versions diffs - 0.0.10 → 0.0.11 - Mend

linkedin-scraper 0.0.10 → 0.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/.gitignore +2 -0
data/.travis.yml +7 -0
data/README.md +37 -50
data/Rakefile +3 -2
data/bin/linkedin-scraper +5 -0
data/lib/linkedin-scraper/profile.rb +177 -248
data/lib/linkedin-scraper/version.rb +1 -1
data/linkedin-scraper.gemspec +6 -1
data/spec/linkedin-scraper/profile_spec.rb +90 -21
metadata +35 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: dbde57b3c40b5f330ed4ab346f42cad639de8d3e
-  data.tar.gz: 464882b2139ff63b164568c104ea47c76ff8b10f
+  metadata.gz: c7cfeee1f051d529594d6d827d6ad373b6aca496
+  data.tar.gz: 681cfad543c0d7daa2863e6c6c2525560cf640df
 SHA512:
-  metadata.gz: fd72bef448e5f91167de5902d91f99874e3153e3e3d0750a0708c6bf4b2fd26995ed3f69c406b5161b3391542d2b0fe71515b70c27bad5dd6edec9933213b92c
-  data.tar.gz: 09292e4bf18775fb423fd50666931c7bab4ca6033cca8147b308c0be7ac97c62351c0d52f60ef337efcbe088d255359e42100db01f47f670240bfff86eea971f
+  metadata.gz: 9c52c63f97a7855b088bb467bab0b72ac4a1616424348318b1072a0768dac512d4279fc04d1c1e8c213d3e57dc3eb368c43d757a8b937f979217d529f2a29510
+  data.tar.gz: 7d6e965dd00cb7ffc23f244eaa70d342e63d58b4a8ce760726f7907661dfbdae76c6a0bb740b3cdbdc20bd32b2f18230a40c4be25b1da1e289373e5da098397c

data/.gitignore CHANGED Viewed

@@ -16,3 +16,5 @@ spec/reports
 test/tmp
 test/version_tmp
 tmp
+.ruby-version
+.ruby-gemset

data/.travis.yml ADDED Viewed

@@ -0,0 +1,7 @@
+language: ruby
+rvm:
+  - 2.0.0
+  - 1.9.3
+  - 1.9.2
+  - jruby-19mode
+  - rbx-19mode

data/README.md CHANGED Viewed

@@ -1,52 +1,64 @@
+[![Build Status](https://secure.travis-ci.org/yatishmehta27/linkedin-scraper.png)](http://travis-ci.org/yatishmehta27/linkedin-scraper)
 Linkedin Scraper
 ================
 Linkedin-scraper is a gem for scraping linkedin public profiles.
-You give it an URL, and it lets you easily get its title, name, country, area, current_companies and much more.
+Given the URL of the profile, it gets the name, country, title, area, current companies, past comapnies,organizations, skills, groups, etc
+##Installation
-Installation
-------------
 Install the gem from RubyGems:
     gem install linkedin-scraper
-This gem is tested on Ruby versions 1.8.7, 1.9.2 1.9.3 and 2.0.0
+This gem is tested on 1.9.2, 1.9.3, 2.0.0, JRuby1.9, rbx1.9,
-Usage
------
+##Usage
-Initialize a scraper instance for an URL, like this:
+Initialize a scraper instance
     profile = Linkedin::Profile.get_profile("http://www.linkedin.com/in/jeffweiner08")
-Then you can see the scraped data like this:
+The returning object responds to the following methods
+    profile.first_name          # The first name of the contact
-    profile.first_name          #the First name of the contact
+    profile.last_name           # The last name of the contact
-    profile.last_name           #the last name of the contact
+    profile.name                # The full name of the profile
-    profile.name                #The Full name of the profile
+    profile.title               # The job title
-    profile.title               #the linkedin job title
+	profile.summary             # The summary of the profile
-    profile.location            #the location of the contact
+    profile.location            # The location of the contact
-    profile.country             #the country of the contact
+    profile.country             # The country of the contact
-    profile.industry            #the domain for which the contact belongs
+    profile.industry            # The domain for which the contact belongs
-    profile.picture             #the profile pic url of contact
+    profile.picture             # The profile picture link of profile
-    profile.skills              #the skills of the profile
+    profile.skills              # Array of skills of the profile
-    profile.organizations       #the organizations of the profile
+    profile.organizations       # Array organizations of the profile
-    profile.education           #Array of hashes for eduction
+    profile.education           # Array of hashes for education
-    profile.picture             #url of the profile picture
+    profile.websites            # Array of websites
+	profile.groups              # Array of groups
+	profile.languages           # Array of languages
+	profile.certifications      # Array of certifications
+For current and past comapnies it also provides the details of the companies like comapny size, industry, address, etc
     profile.current_companies
@@ -116,8 +128,6 @@ Then you can see the scraped data like this:
     profile.past_companies
-    #Array of hash containing its past job companies and job profile
-    #Example
     [
     [0] {
                 :past_company => "Accel Partners",
@@ -181,30 +191,8 @@ Then you can see the scraped data like this:
     ]
-    profile.linkedin_url        #url of the profile
-    profile.websites
-    #Array of websites
-    [
-      [0]   "http://www.linkedin.com/"
-    ]
-    profile.groups
-    #Array of hashes containing group name and link
-    profile.education
-    #Array of hashes for eduction
-    profile.skills
-    #Array of skills
-    profile.picture
-    #url of the profile picture
     profile.recommended_visitors
-    #Its the list of visitors "Viewers of this profile also viewed..."
+    #It is the list of visitors "Viewers of this profile also viewed..."
     [
     [0] {
            :link => "http://www.linkedin.com/in/barackobama?trk=pub-pbmap",
@@ -262,10 +250,9 @@ Then you can see the scraped data like this:
     }
     ]
-## Credits
-- [Justin Grevich](https://github.com/jgrevich)
-- [Vpoola](https://github.com/vpoola88)
-- [Mark Walker](https://github.com/jfdimark)
+The gem also comes with a binary and can be used from teh command line to get a json response of the scraped data. It takes the url as the first argument.
+    linkedin-scraper http://www.linkedin.com/in/jeffweiner08
-You're welcome to fork this project and send pull requests. I want to thank specially:
+You're welcome to fork this project and send pull requests

data/Rakefile CHANGED Viewed

@@ -1,2 +1,3 @@
-#!/usr/bin/env rake
-require "bundler/gem_tasks"
+require 'rspec/core/rake_task'
+task :default => :spec
+RSpec::Core::RakeTask.new

data/bin/linkedin-scraper ADDED Viewed

@@ -0,0 +1,5 @@
+#!/usr/bin/env ruby
+require './lib/linkedin-scraper'
+profile = Linkedin::Profile.new(ARGV[0])
+puts profile.to_json

data/lib/linkedin-scraper/profile.rb CHANGED Viewed

@@ -2,311 +2,240 @@
 module Linkedin
   class Profile
-    USER_AGENTS = ["Windows IE 6", "Windows IE 7", "Windows Mozilla", "Mac Safari", "Mac FireFox", "Mac Mozilla", "Linux Mozilla", "Linux Firefox", "Linux Konqueror"]
-    attr_accessor :country, :current_companies, :education, :first_name, :groups, :industry, :last_name, :linkedin_url, :location, :page, :past_companies, :picture, :recommended_visitors, :skills, :title, :websites, :organizations, :summary, :certifications, :languages
-    def initialize(page,url)
-      @first_name           = get_first_name(page)
-      @last_name            = get_last_name(page)
-      @title                = get_title(page)
-      @location             = get_location(page)
-      @country              = get_country(page)
-      @industry             = get_industry(page)
-      @picture              = get_picture(page)
-      @summary              = get_summary(page)
-      @current_companies    = get_current_companies(page)
-      @past_companies       = get_past_companies(page)
-      @recommended_visitors = get_recommended_visitors(page)
-      @education            = get_education(page)
-      @linkedin_url         = url
-      @websites             = get_websites(page)
-      @groups               = get_groups(page)
-      @organizations        = get_organizations(page)
-      @certifications       = get_certifications(page)
-      @organizations        = get_organizations(page)
-      @skills               = get_skills(page)
-      @languages            = get_languages(page)
-      @page                 = page
-    end
-    #returns:nil if it gives a 404 request
-    def name
-      name = ''
-      name += "#{self.first_name} " if self.first_name
-      name += self.last_name if self.last_name
-      name
-    end
+    USER_AGENTS = ['Windows IE 6', 'Windows IE 7', 'Windows Mozilla', 'Mac Safari', 'Mac FireFox', 'Mac Mozilla', 'Linux Mozilla', 'Linux Firefox', 'Linux Konqueror']
+    ATTRIBUTES = %w(name first_name last_name title location country industry summary picture linkedin_url education groups websites languages skills certifications organizations past_companies current_companies recommended_visitors)
+    attr_reader :page, :linkedin_url
     def self.get_profile(url)
       begin
-        @agent = Mechanize.new
-        @agent.user_agent_alias = USER_AGENTS.sample
-        @agent.max_history = 0
-        page = @agent.get(url)
-        return Linkedin::Profile.new(page, url)
+        Linkedin::Profile.new(url)
       rescue => e
         puts e
       end
     end
-    def get_skills(page)
-      page.search('.competency.show-bean').map{|skill|skill.text.strip if skill.text} rescue nil
+    def initialize(url)
+      @linkedin_url = url
+      @page         = http_client.get(url)
     end
-    def get_company_url(node)
-      result={}
-      if node.at("h4/strong/a")
-        link = node.at("h4/strong/a")["href"]
-        @agent = Mechanize.new
-        @agent.user_agent_alias = USER_AGENTS.sample
-        @agent.max_history = 0
-        page = @agent.get("http://www.linkedin.com"+link)
-        result[:linkedin_company_url] = "http://www.linkedin.com"+link
-        result[:url] = page.at(".basic-info/div/dl/dd/a").text if page.at(".basic-info/div/dl/dd/a")
-        node_2 = page.at(".basic-info").at(".content.inner-mod")
-        node_2.search("dd").zip(node_2.search("dt")).each do |value,title|
-          result[title.text.gsub(" ","_").downcase.to_sym] = value.text.strip
-        end
-        result[:address] = page.at(".vcard.hq").at(".adr").text.gsub("\n"," ").strip if page.at(".vcard.hq")
-      end
-      result
+    def name
+      "#{first_name} #{last_name}"
     end
-    private
-    def get_first_name page
-      return page.at(".given-name").text.strip if page.search(".given-name").first
+    def first_name
+      @first_name ||= (@page.at('.given-name').text.strip if @page.at('.given-name'))
     end
-    def get_last_name page
-      return page.at(".family-name").text.strip if page.search(".family-name").first
+    def last_name
+      @last_name ||= (@page.at('.family-name').text.strip if @page.at('.family-name'))
     end
-    def get_title page
-      return page.at(".headline-title").text.gsub(/\s+/, " ").strip if page.search(".headline-title").first
+    def title
+      @title ||= (@page.at('.headline-title').text.gsub(/\s+/, ' ').strip if @page.at('.headline-title'))
     end
-    def get_location page
-      return page.at(".locality").text.split(",").first.strip if page.search(".locality").first
+    def location
+      @location ||= (@page.at('.locality').text.split(',').first.strip if @page.at('.locality'))
     end
-    def get_country page
-      return page.at(".locality").text.split(",").last.strip if page.search(".locality").first
+    def country
+      @country ||= (@page.at('.locality').text.split(',').last.strip if @page.at('.locality'))
     end
-    def get_industry page
-      return page.at(".industry").text.gsub(/\s+/, " ").strip if page.search(".industry").first
+    def industry
+      @industry ||= (@page.at('.industry').text.gsub(/\s+/, ' ').strip if @page.at('.industry'))
     end
-    def get_summary(page)
-      page.at(".description.summary").text.gsub(/\s+/, " ").strip if page.search(".description.summary").first
+    def summary
+      @summary ||= (@page.at('.description.summary').text.gsub(/\s+/, ' ').strip if @page.at('.description.summary'))
     end
+    def picture
+      @picture ||= (@page.at('#profile-picture/img.photo').attributes['src'].value.strip if @page.at('#profile-picture/img.photo'))
+    end
-    def get_picture page
-      return page.at("#profile-picture/img.photo").attributes['src'].value.strip if page.search("#profile-picture/img.photo").first
+    def skills
+      @skills ||= (@page.search('.competency.show-bean').map{|skill| skill.text.strip if skill.text} rescue nil)
     end
-    def get_past_companies page
-      past_cs=[]
-      if page.search(".position.experience.vevent.vcard.summary-past").first
-        page.search(".position.experience.vevent.vcard.summary-past").each do |past_company|
-          result = get_company_url past_company
-          url = result[:url]
-          title = past_company.at("h3").text.gsub(/\s+|\n/, " ").strip if past_company.at("h3")
-          company = past_company.at("h4").text.gsub(/\s+|\n/, " ").strip if past_company.at("h4")
-          description = past_company.at(".description.past-position").text.gsub(/\s+|\n/, " ").strip if past_company.at(".description.past-position")
-          p_company = {:past_company=>company,:past_title=> title,:past_company_website=>url,:description=>description}
-          p_company = p_company.merge(result)
-          past_cs << p_company
-        end
-        return past_cs
-      end
+    def past_companies
+      @past_companies ||= get_companies('past')
     end
-    def get_current_companies page
-      current_cs = []
-      if page.search(".position.experience.vevent.vcard.summary-current").first
-        page.search(".position.experience.vevent.vcard.summary-current").each do |current_company|
-          result = get_company_url current_company
-          url = result[:url]
-          title = current_company.at("h3").text.gsub(/\s+|\n/, " ").strip if current_company.at("h3")
-          company = current_company.at("h4").text.gsub(/\s+|\n/, " ").strip if current_company.at("h4")
-          description = current_company.at(".description.current-position").text.gsub(/\s+|\n/, " ").strip if current_company.at(".description.current-position")
-          current_company = {:current_company=>company,:current_title=> title,:current_company_url=>url,:description=>description}
-          current_cs << current_company.merge(result)
-        end
-        return current_cs
-      end
+    def current_companies
+      @current_companies ||= get_companies('current')
     end
-    def get_education(page)
-      education=[]
-      if page.search(".position.education.vevent.vcard").first
-        page.search(".position.education.vevent.vcard").each do |item|
-          name   = item.at("h3").text.gsub(/\s+|\n/, " ").strip if item.at("h3")
-          desc   = item.at("h4").text.gsub(/\s+|\n/, " ").strip if item.at("h4")
-          period = item.at(".period").text.gsub(/\s+|\n/, " ").strip if item.at(".period")
-          edu = {:name => name,:description => desc,:period => period}
-          education << edu
+    def education
+      unless @education
+        @education = []
+        if @page.search('.position.education.vevent.vcard').first
+          @education = @page.search('.position.education.vevent.vcard').map do |item|
+            name   = item.at('h3').text.gsub(/\s+|\n/, ' ').strip      if item.at('h3')
+            desc   = item.at('h4').text.gsub(/\s+|\n/, ' ').strip      if item.at('h4')
+            period = item.at('.period').text.gsub(/\s+|\n/, ' ').strip if item.at('.period')
+            {:name => name, :description => desc, :period => period}
+          end
         end
-        return education
       end
-    end
-    def get_websites(page)
-      websites=[]
-      if page.search(".website").first
-        page.search(".website").each do |site|
-          url = site.at("a")["href"]
-          url = "http://www.linkedin.com"+url
-          url = CGI.parse(URI.parse(url).query)["url"]
-          websites << url
+       @education
+    end
+    def websites
+      unless @websites
+        @websites = []
+        if @page.search('.website').first
+          @websites = @page.search('.website').map do |site|
+            url = site.at('a')['href']
+            url = "http://www.linkedin.com#{url}"
+            CGI.parse(URI.parse(url).query)['url']
+          end.flatten!
         end
-        return websites.flatten!
       end
+      @websites
     end
-    def get_groups(page)
-      groups = []
-      if page.search(".group-data").first
-        page.search(".group-data").each do |item|
-          name = item.text.gsub(/\s+|\n/, " ").strip
-          link = "http://www.linkedin.com"+item.at("a")["href"]
-          groups << {:name=>name,:link=>link}
+    def groups
+      unless @groups
+        @groups = []
+        if page.search('.group-data').first
+          @groups = page.search('.group-data').each do |item|
+            name = item.text.gsub(/\s+|\n/, ' ').strip
+            link = "http://www.linkedin.com#{item.at('a')['href']}"
+            {:name => name, :link => link}
+          end
         end
-        return groups
       end
-    end
-    def get_organizations(page)
-      organizations = []
-      # if the profile contains org data
-      if page.search('ul.organizations li.organization').first
-        # loop over each element with org data
-        page.search('ul.organizations li.organization').each do |item|
-          begin
-            # find the h3 element within the above section and get the text with excess white space stripped
-            name = item.search('h3').text.gsub(/\s+|\n/, " ").strip
-            position = nil # add this later
-            occupation = nil # add this latetr too, this relates to the experience/work
-            start_date = Date.parse(item.search('ul.specifics li').text.gsub(/\s+|\n/, " ").strip.split(' to ').first)
-            if item.search('ul.specifics li').text.gsub(/\s+|\n/, " ").strip.split(' to ').last == 'Present'
-              end_date = nil
-            else
-              Date.parse(item.search('ul.specifics li').text.gsub(/\s+|\n/, " ").strip.split(' to ').last)
-            end
-            organizations << { name: name, start_date: start_date, end_date: end_date }
-          rescue => e
+      @groups
+    end
+    def organizations
+      unless @organizations
+        @organizations = []
+        if @page.search('ul.organizations/li.organization').first
+          @organizations = @page.search('ul.organizations/li.organization').map do |item|
+            name       = item.search('h3').text.gsub(/\s+|\n/, ' ').strip rescue nil
+            start_date, end_date = item.search('ul.specifics li').text.gsub(/\s+|\n/, ' ').strip.split(' to ')
+            start_date = Date.parse(start_date) rescue nil
+            end_date   = Date.parse(end_date)   rescue nil
+            {:name => name, :start_date => start_date, :end_date => end_date}
           end
         end
-        return organizations
       end
+      @organizations
     end
-    def get_languages(page)
-      languages = []
-      # if the profile contains org data
-      if page.search('ul.languages li.language').first
-        # loop over each element with org data
-        page.search('ul.languages li.language').each do |item|
-          begin
-            # find the h3 element within the above section and get the text with excess white space stripped
-            language = item.at('h3').text
-            proficiency = item.at('span.proficiency').text.gsub(/\s+|\n/, " ").strip
-            languages << { language:language, proficiency:proficiency }
-          rescue => e
+    def languages
+      unless @languages
+        @languages = []
+        if @page.at('ul.languages/li.language')
+          @languages = @page.search('ul.languages/li.language').map do |item|
+            language    = item.at('h3').text rescue nil
+            proficiency = item.at('span.proficiency').text.gsub(/\s+|\n/, ' ').strip rescue nil
+            {:language=> language, :proficiency => proficiency }
           end
         end
-        return languages
-      end # page.search('ul.organizations li.organization').first
-    end
-    def get_certifications(page)
-      certifications = []
-      # search string to use with Nokogiri
-      query = 'ul.certifications li.certification'
-      months = 'January|February|March|April|May|June|July|August|September|November|December'
-      regex = /(#{months}) (\d{4})/
-      # if the profile contains cert data
-      if page.search(query).first
-        # loop over each element with cert data
-        page.search(query).each do |item|
-          begin
-            item_text = item.text.gsub(/\s+|\n/, " ").strip
-            name = item_text.split(" #{item_text.scan(/#{months} \d{4}/)[0]}")[0]
-            authority = nil # we need a profile with an example of this and probably will need to use the API to accuratetly get this data
-            license = nil # we need a profile with an example of this and probably will need to use the API to accuratetly get this data
-            start_date = Date.parse(item_text.scan(regex)[0].join(' '))
-            includes_end_date = item_text.scan(regex).count > 1
-            end_date = includes_end_date ? Date.parse(item_text.scan(regex)[0].join(' ')) : nil # we need a profile with an example of this and probably will need to use the API to accuratetly get this data
-            certifications << { name:name, authority:authority, license:license, start_date:start_date, end_date:end_date }
-          rescue => e
+      end
+      @languages
+    end
+    def certifications
+      unless @certtifications
+        @certifications = []
+        if @page.at('ul.certifications/li.certification')
+          @certifications = @page.search('ul.certifications/li.certification').map do |item|
+            name       = item.at('h3').text.gsub(/\s+|\n/, ' ').strip                         rescue nil
+            authority  = item.at('.specifics/.org').text.gsub(/\s+|\n/, ' ').strip            rescue nil
+            license    = item.at('.specifics/.licence-number').text.gsub(/\s+|\n/, ' ').strip rescue nil
+            start_date = item.at('.specifics/.dtstart').text.gsub(/\s+|\n/, ' ').strip        rescue nil
+            {:name => name, :authority => authority, :license => license, :start_date => start_date}
           end
         end
-        return certifications
       end
-    end
-    def get_organizations(page)
-      organizations = []
-      # if the profile contains org data
-      if page.search('ul.organizations li.organization').first
-        # loop over each element with org data
-        page.search('ul.organizations li.organization').each do |item|
-          begin
-            # find the h3 element within the above section and get the text with excess white space stripped
-            name = item.search('h3').text.gsub(/\s+|\n/, " ").strip
-            position = nil # add this later
-            occupation = nil # add this latetr too, this relates to the experience/work
-            start_date = Date.parse(item.search('ul.specifics li').text.gsub(/\s+|\n/, " ").strip.split(' to ').first)
-            if item.search('ul.specifics li').text.gsub(/\s+|\n/, " ").strip.split(' to ').last == 'Present'
-              end_date = nil
-            else
-              Date.parse(item.search('ul.specifics li').text.gsub(/\s+|\n/, " ").strip.split(' to ').last)
-            end
-            organizations << { name: name, start_date: start_date, end_date: end_date }
-          rescue => e
+      @certifications
+    end
+    def recommended_visitors
+      unless @recommended_visitors
+        @recommended_visitors = []
+        if @page.at('.browsemap/.content/ul/li')
+          @recommended_visitors = @page.search('.browsemap/.content/ul/li').map do |visitor|
+            v = {}
+            v[:link]    = visitor.at('a')['href']
+            v[:name]    = visitor.at('strong/a').text
+            v[:title]   = visitor.at('.headline').text.gsub('...',' ').split(' at ').first
+            v[:company] = visitor.at('.headline').text.gsub('...',' ').split(' at ')[1]
+            v
           end
         end
       end
-      return organizations
+      @recommended_visitors
     end
+    def to_json
+      require 'json'
+      hash = {}
+      ATTRIBUTES.each do |attribute|
+        hash[attribute.to_sym] = self.send(attribute.to_sym)
+      end
+      hash.to_json
+    end
-    def get_recommended_visitors(page)
-      recommended_vs=[]
-      if page.search(".browsemap").first
-        page.at(".browsemap").at("ul").search("li").each do |visitor|
-          v = {}
-          v[:link]    = visitor.at('a')["href"]
-          v[:name]    = visitor.at('strong/a').text
-          v[:title]   = visitor.at('.headline').text.gsub("..."," ").split(" at ").first
-          v[:company] = visitor.at('.headline').text.gsub("..."," ").split(" at ")[1]
-          recommended_vs << v
+    private
+    def get_companies(type)
+      companies = []
+      if @page.search(".position.experience.vevent.vcard.summary-#{type}").first
+        @page.search(".position.experience.vevent.vcard.summary-#{type}").each do |node|
+          company               = {}
+          company[:title]       = node.at('h3').text.gsub(/\s+|\n/, ' ').strip if node.at('h3')
+          company[:company]     = node.at('h4').text.gsub(/\s+|\n/, ' ').strip if node.at('h4')
+          company[:description] = node.at(".description.#{type}-position").text.gsub(/\s+|\n/, ' ').strip if node.at(".description.#{type}-position")
+          start_date  = node.at('.dtstart').text.gsub(/\s+|\n/, ' ').strip rescue nil
+          company[:start_date] = Date.parse(start_date) rescue nil
+          end_date  = node.at('.dtend').text.gsub(/\s+|\n/, ' ').strip rescue nil
+          company[:end_date] = Date.parse(end_date) rescue nil
+          company_link = node.at('h4/strong/a')['href'] if node.at('h4/strong/a')
+          result = get_company_details(company_link)
+          companies << company.merge!(result)
+        end
+      end
+      companies
+    end
+    def get_company_details(link)
+      result = {:linkedin_company_url => "http://www.linkedin.com#{link}"}
+      page = http_client.get(result[:linkedin_company_url])
+      result[:url] = page.at('.basic-info/div/dl/dd/a').text if page.at('.basic-info/div/dl/dd/a')
+      node_2 = page.at('.basic-info/.content.inner-mod')
+      if node_2
+        node_2.search('dd').zip(node_2.search('dt')).each do |value,title|
+          result[title.text.gsub(' ','_').downcase.to_sym] = value.text.strip
         end
-        return recommended_vs
+      end
+      result[:address] = page.at('.vcard.hq').at('.adr').text.gsub("\n",' ').strip if page.at('.vcard.hq')
+      result
+    end
+    def http_client
+      Mechanize.new do |agent|
+        agent.user_agent_alias = USER_AGENTS.sample
+        agent.max_history = 0
       end
     end

data/lib/linkedin-scraper/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Linkedin
   module Scraper
-    VERSION = "0.0.10"
+    VERSION = "0.0.11"
   end
 end

data/linkedin-scraper.gemspec CHANGED Viewed

@@ -6,11 +6,16 @@ Gem::Specification.new do |gem|
   gem.description   = %q{Scrapes the linkedin profile when a url is given }
   gem.summary       = %q{when a url of  public linkedin profile page is given it scrapes the entire page and converts into a accessible object}
   gem.homepage      = "https://github.com/yatishmehta27/linkedin-scraper"
-  gem.add_dependency(%q<mechanize>, [">= 0"])
   gem.files         = `git ls-files`.split($\)
   gem.executables   = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
   gem.test_files    = gem.files.grep(%r{^(test|spec|features)/})
   gem.name          = "linkedin-scraper"
   gem.require_paths = ["lib"]
   gem.version       = Linkedin::Scraper::VERSION
+  gem.add_dependency(%q<mechanize>, [">= 0"])
+  gem.add_development_dependency 'rspec','>=0'
+  gem.add_development_dependency 'rake'
 end

data/spec/linkedin-scraper/profile_spec.rb CHANGED Viewed

@@ -5,63 +5,120 @@ describe Linkedin::Profile do
   before(:all) do
-    page = Nokogiri::HTML(File.open("spec/fixtures/jgrevich.html", 'r') { |f| f.read })
-    @profile = Linkedin::Profile.new(page, "http://www.linkedin.com/in/jgrevich")
+    @page = Nokogiri::HTML(File.open("spec/fixtures/jgrevich.html", 'r') { |f| f.read })
+    @profile = Linkedin::Profile.new("http://www.linkedin.com/in/jgrevich")
   end
-  describe "::get_profile" do
-    it "Create an instance of profile class" do
+  describe ".get_profile" do
+    it "Create an instance of Linkedin::Profile class" do
       expect(@profile).to be_instance_of Linkedin::Profile
     end
   end
-  describe ".first_name" do
-    it 'returns the first and last name of the profile' do
+  describe "#first_name" do
+    it 'returns the first name of the profile' do
       expect(@profile.first_name).to eq "Justin"
     end
   end
-  describe ".last_name" do
+  describe "#last_name" do
     it 'returns the last name of the profile' do
       expect(@profile.last_name).to eq "Grevich"
     end
   end
-  describe ".name" do
-    it 'returns the first and last name of the profile' do
-      expect(@profile.name).to eq "Justin Grevich"
+  describe '#title' do
+    it 'returns the title of the profile' do
+      expect(@profile.title).to eq 'Presidential Innovation Fellow'
+    end
+  end
+  describe '#location' do
+    it 'returns the location of the profile' do
+      expect(@profile.location).to eq 'Washington'
+    end
+  end
+  describe '#country' do
+    it 'returns the country of the profile' do
+      expect(@profile.country).to eq 'District Of Columbia'
+    end
+  end
+  describe '#industry' do
+    it 'returns the industry of the profile' do
+      expect(@profile.industry).to eq 'Information Technology and Services'
+    end
+  end
+  describe '#summary' do
+    it 'returns the summary of the profile' do
+      expect(@profile.summary).to match(/Justin Grevich is a Presidential Innovation Fellow working/)
     end
   end
-  describe ".certifications" do
-    it 'returns an array of certification hashes' do
-      expect(@profile.certifications.class).to eq Array
-      expect(@profile.certifications.count).to eq 2
+  describe '#picture' do
+    it 'returns the picture url of the profile' do
+      expect(@profile.picture).to eq 'http://m.c.lnkd.licdn.com/mpr/pub/image-1OSOQPrarAEIMksx5uUyhfRUO9zb6R4JjbULhhrDOMFS6dtV1OSLWbcaOK9b92S3rlE9/justin-grevich.jpg'
     end
+  end
-    it 'returns the certification name' do
-      expect(@profile.certifications.first[:name]).to eq "CISSP"
+  describe '#skills' do
+    it 'returns the array of skills of the profile' do
+      skills = ["Ruby", "Ruby on Rails", "Web Development", "Web Applications", "CSS3", "HTML 5", "Shell Scripting", "Python", "Chef", "Git", "Subversion", "JavaScript", "Rspec", "jQuery", "Capistrano", "Sinatra", "CoffeeScript", "Haml", "Standards Compliance", "MySQL", "PostgreSQL", "Solr", "Sphinx", "Heroku", "Amazon Web Services (AWS)", "Information Security", "Vulnerability Assessment", "SAN", "ZFS", "Backup Solutions", "SaaS", "System Administration", "Project Management", "Linux", "Troubleshooting", "Network Security", "OS X", "Bash", "Cloud Computing", "Web Design", "MongoDB", "Z-Wave", "Home Automation"]
+      expect(@profile.skills).to include(*skills)
     end
+  end
+  describe '#past_companies' do
+    it 'returns an array of hashes of past companies with its details' do
+      @profile.past_companies
+    end
+  end
-    it 'returns the certification start_date' do
-      expect(@profile.certifications.first[:start_date]).to eq Date.parse('December 2010')
+  describe '#current_companies' do
+    it 'returns an array of hashes of current companies with its details' do
+      @profile.current_companies
     end
   end
-  describe ".organizations" do
+  describe '#education' do
+    it 'returns the array of hashes of education with details' do
+      @profile.education
+    end
+  end
+  describe '#websites' do
+    it 'returns the array of websites' do
+      @profile.websites
+    end
+  end
+  describe '#groups' do
+    it 'returns the array of hashes of groups with details' do
+      @profile.groups
+    end
+  end
+  describe "#name" do
+    it 'returns the first and last name of the profile' do
+      expect(@profile.name).to eq "Justin Grevich"
+    end
+  end
+  describe "#organizations" do
     it 'returns an array of organization hashes for the profile' do
       expect(@profile.organizations.class).to eq Array
       expect(@profile.organizations.first[:name]).to eq 'San Diego Ruby'
     end
   end
-  describe ".languages" do
+  describe "#languages" do
     it 'returns an array of languages hashes' do
       expect(@profile.languages.class).to eq Array
     end
     context 'with language data' do
       it 'returns an array with one language hash' do
         expect(@profile.languages.class).to eq Array
       end
@@ -76,8 +133,20 @@ describe Linkedin::Profile do
         end
       end
     end # context 'with language data' do
   end # describe ".languages" do
+  describe '#recommended_visitors' do
+    it 'returns the array of hashes of recommended visitors' do
+      @profile.recommended_visitors
+    end
+  end
+  describe '#certifications' do
+    it 'returns the array of hashes of certifications' do
+      @profile.certifications
+    end
+  end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: linkedin-scraper
 version: !ruby/object:Gem::Version
-  version: 0.0.10
+  version: 0.0.11
 platform: ruby
 authors:
 - Yatish Mehta
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-06-18 00:00:00.000000000 Z
+date: 2013-09-23 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: mechanize
@@ -24,17 +24,48 @@ dependencies:
     - - '>='
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
 description: 'Scrapes the linkedin profile when a url is given '
 email:
-executables: []
+executables:
+- linkedin-scraper
 extensions: []
 extra_rdoc_files: []
 files:
 - .gitignore
+- .travis.yml
 - Gemfile
 - LICENSE
 - README.md
 - Rakefile
+- bin/linkedin-scraper
 - lib/linkedin-scraper.rb
 - lib/linkedin-scraper/profile.rb
 - lib/linkedin-scraper/version.rb
@@ -61,7 +92,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.0.3
+rubygems_version: 2.1.2
 signing_key:
 specification_version: 4
 summary: when a url of  public linkedin profile page is given it scrapes the entire