RubyGems - linkedin-scraper - Versions diffs - 1.1.0 → 2.1.0 - Mend

linkedin-scraper 1.1.0 → 2.1.0

Files changed (13) hide show

checksums.yaml +4 -4
data/.travis.yml +1 -1
data/CHANGE.md +0 -0
data/README.md +48 -123
data/bin/linkedin-scraper +4 -2
data/lib/{linkedin_scraper.rb → linkedin-scraper.rb} +1 -1
data/lib/linkedin-scraper/profile.rb +284 -0
data/lib/{linkedin_scraper → linkedin-scraper}/version.rb +1 -1
data/linkedin-scraper.gemspec +1 -1
data/spec/linkedin_scraper/profile_spec.rb +1 -7
data/spec/spec_helper.rb +1 -10
metadata +7 -6
data/lib/linkedin_scraper/profile.rb +0 -265

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 2c77bcfa337ca481575a21d58974d9f574c75578
-  data.tar.gz: e10ffd0759cb6535202a8977b9b9f3d0bb5530ce
+  metadata.gz: 5ad336d03d93fa02d91f8c8452d594de789caba6
+  data.tar.gz: 157ac80ca5c1887181d83c69a40fdba7523298e0
 SHA512:
-  metadata.gz: 177977da49150f249c6bc7f5d7dcbaaf4ac2d167c8e4835c8e1d799e5ff126704972d959b4acd86db0b1a93baa0c43233621367721ced19bcee521c39ca317ea
-  data.tar.gz: c1e314563caecc20c601abcf1769846a20b807b0966dbfc804a152c721d0f116c9ab5156a9a5aceb0d6c76799281da7745eb24543979e9205a41508f9f68d03e
+  metadata.gz: afda2323f3441e925e9d4dad00f7df991ecdfd918268b79cf258ba908f8eb1f0c76bc453488d15fc74302330163f4b7df87ff1f39d228086f80f2d3714bafa28
+  data.tar.gz: 3dcd605d5076a186779b9a37177797f23b47ecc13d0ca646abc176e48933710a09192480261c2220231017d4efb5c517ed79ecb5ad4d6d8bb9954cf15c866f44

data/.travis.yml CHANGED Viewed

@@ -5,7 +5,7 @@ before_install:
 install: bundle install
 rvm:
   - 2.3.0
-  - 2.2.3
+  - 2.2.4
   - 2.2.0
   - 2.1.1
   - 2.0.0

data/CHANGE.md ADDED Viewed

File without changes

data/README.md CHANGED Viewed

@@ -4,6 +4,8 @@
 Linkedin Scraper
 ================
+**2.0.0 is the new version. It does not support the `get_profile` method. It does not support Ruby 1.8**
 Linkedin-scraper is a gem for scraping linkedin public profiles.
 Given the URL of the profile, it gets the name, country, title, area, current companies, past companies,
 organizations, skills, groups, etc
@@ -15,22 +17,29 @@ Install the gem from RubyGems:
     gem install linkedin-scraper
-This gem is tested on 1.9.2, 1.9.3, 2.0.0, 2.2, 2.3, JRuby1.9, rbx1.9,
+This gem is tested on 1.9.2, 1.9.3, 2.0.0, 2.2, 2.3
 ## Usage
 Include the gem
-    require 'linkedin_scraper'
+    require 'linkedin-scraper'
 Initialize a scraper instance
-    profile = Linkedin::Profile.get_profile("http://www.linkedin.com/in/jeffweiner08")
+    profile = Linkedin::Profile.new("http://www.linkedin.com/in/jeffweiner08")
 With a http web-proxy:
-    profile = Linkedin::Profile.get_profile("http://www.linkedin.com/in/jeffweiner08", {:proxy_ip=>'127.0.0.1',:proxy_port=>'3128', :username=>"user", :password=>'pass'})
+    profile = Linkedin::Profile.new("http://www.linkedin.com/in/jeffweiner08", { proxy_ip: '127.0.0.1', proxy_port: '3128', username: 'user', password: 'pass' })
+The scraper can also get the details of each past and current companies. This will lead to multiple hits.
+To enable this functionality, pass `company_details=true` in options. You can pass them along with proxy options
+as well
+    profile = Linkedin::Profile.new("http://www.linkedin.com/in/jeffweiner08", { company_details: true })
+    profile = Linkedin::Profile.new("http://www.linkedin.com/in/jeffweiner08", { company_details: true, proxy_ip: '127.0.0.1', proxy_port: '3128', username: 'user', password: 'pass' })
 The returning object responds to the following methods
@@ -71,24 +80,35 @@ The returning object responds to the following methods
 For current and past companies it also provides the details of the companies like company size, industry, address, etc
+The company details will only be scraped if you pass company_details=true. It is false by default.
     profile.current_companies
     [
-    [0] {
-             :current_company => "LinkedIn",
-               :current_title => "CEO",
-         :current_company_url => "http://www.linkedin.com",
-                 :description => nil,
-        :linkedin_company_url => "http://www.linkedin.com/company/linkedin?trk=ppro_cprof",
-                         :url => "http://www.linkedin.com",
-                        :type => "Public Company",
-                :company_size => "1001-5000 employees",
-                     :website => "http://www.linkedin.com",
-                    :industry => "Internet",
-                     :founded => "2003",
-                     :address => "2029 Stierlin Court  Mountain View, CA 94043 United States"
-    },
+        [0] {
+                           :title => "CEO",
+                         :company => "LinkedIn",
+                    :company_logo => "https://media.licdn.com/media/AAEAAQAAAAAAAAL0AAAAJGMwYWZhNTYxLWJkMTktNDAzMi05NzEzLTlhNzUxMGU0NDg0Mw.png",
+                        :duration => "7 years 6 months",
+                      :start_date => #<Date: 2008-12-01 ((2454802j,0s,0n),+0s,2299161j)>,
+                        :end_date => "Present",
+            :linkedin_company_url => "https://www.linkedin.com/company/linkedin",
+                         :website => "http://www.linkedin.com",
+                     :description => "The future is all about what you do next and we’re excited to help you get there. Ready for your moonshot? You're closer than you think. \r\n\r\nFounded in 2003, LinkedIn connects the world's professionals to make them more productive and successful. With more than 430 million members worldwide, including executives from every Fortune 500 company, LinkedIn is the world's largest professional network on the Internet. The company has a diversified business model with revenue coming from Talent Solutions, Marketing Solutions and Premium Subscriptions products. Headquartered in Silicon Valley, LinkedIn has offices across the globe.",
+                    :company_size => "5001-10,000 employees",
+                            :type => "Public Company",
+                        :industry => "Internet",
+                         :founded => 2003,
+                         :address => "2029 Stierlin Court  Mountain View, CA 94043 United States",
+                         :street1 => "2029 Stierlin Court",
+                         :street2 => "",
+                            :city => "Mountain View",
+                             :zip => "94043",
+                           :state => "CA",
+                         :country => "United States"
+        }
+    ]
     [1] {
              :current_company => "Intuit",
                :current_title => "Member, Board of Directors",
@@ -97,114 +117,16 @@ For current and past companies it also provides the details of the companies lik
         :linkedin_company_url => "http://www.linkedin.com/company/intuit?trk=ppro_cprof",
                          :url => "http://network.intuit.com/",
                         :type => "Public Company",
-                :company_size => "5001-10,000 employees",
-                     :website => "http://network.intuit.com/",
-                    :industry => "Computer Software",
-                     :founded => "1983",
-                     :address => "2632 Marine Way  Mountain View, CA 94043 United States"
-    },
-    [2] {
-             :current_company => "DonorsChoose",
-               :current_title => "Member, Board of Directors",
-         :current_company_url => "http://www.donorschoose.org",
-                 :description => nil,
-        :linkedin_company_url => "http://www.linkedin.com/company/donorschoose.org?trk=ppro_cprof",
-                         :url => "http://www.donorschoose.org",
-                        :type => "Nonprofit",
-                :company_size => "51-200 employees",
-                     :website => "http://www.donorschoose.org",
-                    :industry => "Nonprofit Organization Management",
-                     :founded => "2000",
-                     :address => "213 West 35th Street 2nd Floor East New York, NY 10001 United States"
-    },
-    [3] {
-            :current_company => "Malaria No More",
-              :current_title => "Member, Board of Directors",
-        :current_company_url => nil,
-                :description => nil
-    },
-    [4] {
-             :current_company => "Venture For America",
-               :current_title => "Member, Advisory Board",
-         :current_company_url => "http://ventureforamerica.org/",
-                 :description => nil,
-        :linkedin_company_url => "http://www.linkedin.com/company/venture-for-america?trk=ppro_cprof",
-                         :url => "http://ventureforamerica.org/",
-                        :type => "Nonprofit",
-                :company_size => "1-10 employees",
-                     :website => "http://ventureforamerica.org/",
-                    :industry => "Nonprofit Organization Management",
-                     :founded => "2011"
-    }
-    ]
+                        .
+                        .
+                        .
     profile.past_companies
-    [
-    [0] {
-                :past_company => "Accel Partners",
-                  :past_title => "Executive in Residence",
-        :past_company_website => "http://www.facebook.com/accel",
-                 :description => nil,
-        :linkedin_company_url => "http://www.linkedin.com/company/accel-partners?trk=ppro_cprof",
-                         :url => "http://www.facebook.com/accel",
-                        :type => "Partnership",
-                :company_size => "51-200 employees",
-                     :website => "http://www.facebook.com/accel",
-                    :industry => "Venture Capital & Private Equity",
-                     :address => "428 University Palo Alto, CA 94301 United States"
-    },
-    [1] {
-                :past_company => "Greylock",
-                  :past_title => "Executive in Residence",
-        :past_company_website => "http://www.greylock.com",
-                 :description => nil,
-        :linkedin_company_url => "http://www.linkedin.com/company/greylock-partners?trk=ppro_cprof",
-                         :url => "http://www.greylock.com",
-                        :type => "Partnership",
-                :company_size => "51-200 employees",
-                     :website => "http://www.greylock.com",
-                    :industry => "Venture Capital & Private Equity",
-                     :address => "2550 Sand Hill Road  Menlo Park, CA 94025 United States"
-    },
-    [2] {
-                :past_company => "Yahoo!",
-                  :past_title => "Executive Vice President Network Division",
-        :past_company_website => "http://www.yahoo.com",
-                 :description => nil,
-        :linkedin_company_url => "http://www.linkedin.com/company/yahoo?trk=ppro_cprof",
-                         :url => "http://www.yahoo.com",
-                        :type => "Public Company",
-                :company_size => "10,001+ employees",
-                     :website => "http://www.yahoo.com",
-                    :industry => "Internet",
-                     :founded => "1994",
-                     :address => "701 First Avenue  Sunnyvale, CA 94089 United States"
-    },
-    [3] {
-                :past_company => "Windsor Media",
-                  :past_title => "Founding Partner",
-        :past_company_website => nil,
-                 :description => nil
-    },
-    [4] {
-                :past_company => "Warner Bros.",
-                  :past_title => "Vice President Online",
-        :past_company_website => "http://www.warnerbros.com/",
-                 :description => nil,
-        :linkedin_company_url => "http://www.linkedin.com/company/warner-bros.-entertainment-group-of-companies?trk=ppro_cprof",
-                         :url => "http://www.warnerbros.com/",
-                        :type => "Public Company",
-                :company_size => "10,001+ employees",
-                     :website => "http://www.warnerbros.com/",
-                    :industry => "Entertainment",
-                     :address => "4000 Warner Boulevard  Burbank, CA 91522 United States"
-    }
-    ]
+    # Same as current companies
     profile.recommended_visitors
-    #It is the list of visitors "Viewers of this profile also viewed..."
+    # It is the list of visitors "Viewers of this profile also viewed..."
     [
     [0] {
            :link => "http://www.linkedin.com/in/barackobama?trk=pub-pbmap",
@@ -264,10 +186,13 @@ For current and past companies it also provides the details of the companies lik
 The gem also comes with a binary and can be used from the command line to get a json response of the scraped data.
-It takes the url as the first argument.
+It takes the url as the first argument. If the last argument is true it will fetch the company details for each company
     linkedin-scraper http://www.linkedin.com/in/jeffweiner08 127.0.0.1 3128 username password
+    linkedin-scraper http://www.linkedin.com/in/jeffweiner08 127.0.0.1 3128 username password true
 ## Contributing
 Bug reports and pull requests are welcome on GitHub at https://github.com/yatish27/linkedin-scraper.

data/bin/linkedin-scraper CHANGED Viewed

@@ -1,10 +1,12 @@
 #!/usr/bin/env ruby
-require_relative '../lib/linkedin_scraper'
+require_relative '../lib/linkedin-scraper'
 options = {}
-options[:proxy_ip] = ARGV[1]
+options[:proxy_ip] = ARGV[1]
 options[:proxy_port] = ARGV[2]
 options[:username] = ARGV[3]
 options[:password] = ARGV[4]
+options[:company_details] = ARGV[5]
 profile = Linkedin::Profile.new(ARGV[0], options)
 puts JSON.pretty_generate JSON.parse(profile.to_json)

data/lib/{linkedin_scraper.rb → linkedin-scraper.rb} RENAMED Viewed

@@ -3,4 +3,4 @@ require "mechanize"
 require "cgi"
 require "net/http"
 require "random_user_agent"
-Dir["#{File.expand_path(File.dirname(__FILE__))}/linkedin_scraper/*.rb"].each { |file| require file }
+Dir["#{File.expand_path(File.dirname(__FILE__))}/linkedin-scraper/*.rb"].each { |file| require file }

data/lib/linkedin-scraper/profile.rb ADDED Viewed

@@ -0,0 +1,284 @@
+# -*- encoding: utf-8 -*-
+module Linkedin
+  class Profile
+    ATTRIBUTES = %w(
+      name
+      first_name
+      last_name
+      title
+      location
+      number_of_connections
+      country
+      industry
+      summary
+      picture
+      projects
+      linkedin_url
+      education
+      groups
+      websites
+      languages
+      skills
+      certifications
+      organizations
+      past_companies
+      current_companies
+      recommended_visitors )
+    attr_reader :page, :linkedin_url
+    def initialize(url, options = {})
+      @linkedin_url = url
+      @options = options
+      @page = http_client.get(url)
+    end
+    def name
+      "#{first_name} #{last_name}"
+    end
+    def first_name
+      @first_name ||= (@page.at('.fn').text.split(' ', 2)[0].strip if @page.at('.fn'))
+    end
+    def last_name
+      @last_name ||= (@page.at('.fn').text.split(' ', 2)[1].strip if @page.at('.fn'))
+    end
+    def title
+      @title ||= (@page.at('.title').text.gsub(/\s+/, ' ').strip if @page.at('.title'))
+    end
+    def location
+      @location ||= (@page.at('.locality').text.split(',').first.strip if @page.at('.locality'))
+    end
+    def country
+      @country ||= (@page.at('.locality').text.split(',').last.strip if @page.at('.locality'))
+    end
+    def number_of_connections
+      if @page.at('.member-connections')
+        @connections ||= (@page.at('.member-connections').text.match(/[0-9]+[\+]{0,1}/)[0])
+      end
+    end
+    def industry
+      if @page.at('#demographics .descriptor')
+        @industry ||= (@page.search('#demographics .descriptor')[-1].text.gsub(/\s+/, ' ').strip)
+      end
+    end
+    def summary
+      @summary ||= (@page.at('#summary .description').text.gsub(/\s+/, ' ').strip if @page.at('#summary .description'))
+    end
+    def picture
+      if @page.at('.profile-picture img')
+        @picture ||= @page.at('.profile-picture img').attributes.values_at('src', 'data-delayed-url').
+            compact.first.value.strip
+      end
+    end
+    def skills
+      @skills ||= (@page.search('.pills .skill:not(.see-less)').map { |skill| skill.text.strip if skill.text } rescue nil)
+    end
+    def past_companies
+      @past_companies ||= get_companies.reject { |c| c[:end_date] == 'Present' }
+    end
+    def current_companies
+      @current_companies ||= get_companies.find_all { |c| c[:end_date] == 'Present' }
+    end
+    def education
+      @education ||= @page.search('.schools .school').map do |item|
+        name = item.at('h4').text.gsub(/\s+|\n/, ' ').strip if item.at('h4')
+        desc = item.search('h5').last.text.gsub(/\s+|\n/, ' ').strip if item.search('h5').last
+        if item.search('h5').last.at('.degree')
+          degree = item.search('h5').last.at('.degree').text.gsub(/\s+|\n/, ' ').strip.gsub(/,$/, '')
+        end
+        major = item.search('h5').last.at('.major').text.gsub(/\s+|\n/, ' ').strip if item.search('h5').last.at('.major')
+        period = item.at('.date-range').text.gsub(/\s+|\n/, ' ').strip if item.at('.date-range')
+        start_date, end_date = item.at('.date-range').text.gsub(/\s+|\n/, ' ').strip.split(' – ') rescue nil
+        {
+            name: name,
+            description: desc,
+            degree: degree,
+            major: major,
+            period: period,
+            start_date: start_date,
+            end_date: end_date
+        }
+      end
+    end
+    def websites
+      @websites ||= @page.search('.websites li').flat_map do |site|
+        url = site.at('a')['href']
+        CGI.parse(URI.parse(url).query)['url']
+      end
+    end
+    def groups
+      @groups ||= @page.search('#groups .group .item-title').map do |item|
+        name = item.text.gsub(/\s+|\n/, ' ').strip
+        link = item.at('a')['href']
+        { name: name, link: link }
+      end
+    end
+    def organizations
+      @organizations ||= @page.search('#background-organizations .section-item').map do |item|
+        name = item.at('.summary').text.gsub(/\s+|\n/, ' ').strip rescue nil
+        start_date, end_date = item.at('.organizations-date').text.gsub(/\s+|\n/, ' ').strip.split(' – ') rescue nil
+        start_date = Date.parse(start_date) rescue nil
+        end_date = Date.parse(end_date) rescue nil
+        {name: name, start_date: start_date, end_date: end_date}
+      end
+    end
+    def languages
+      @languages ||= @page.search('.background-languages #languages ol li').map do |item|
+        language = item.at('h4').text rescue nil
+        proficiency = item.at('div.languages-proficiency').text.gsub(/\s+|\n/, ' ').strip rescue nil
+        { language: language, proficiency: proficiency }
+      end
+    end
+    def certifications
+      @certifications ||= @page.search('background-certifications').map do |item|
+        name = item.at('h4').text.gsub(/\s+|\n/, ' ').strip rescue nil
+        authority = item.at('h5').text.gsub(/\s+|\n/, ' ').strip rescue nil
+        license = item.at('.specifics/.licence-number').text.gsub(/\s+|\n/, ' ').strip rescue nil
+        start_date = item.at('.certification-date').text.gsub(/\s+|\n/, ' ').strip rescue nil
+        { name: name, authority: authority, license: license, start_date: start_date }
+      end
+    end
+    def recommended_visitors
+      @recommended_visitors ||= @page.search('.insights .browse-map/ul/li.profile-card').map do |node|
+        visitor = {}
+        visitor[:link] = node.at('a')['href']
+        visitor[:name] = node.at('h4/a').text
+        if node.at('.headline')
+          visitor[:title], visitor[:company], _ = node.at('.headline').text.gsub('...', ' ').split(' at ')
+        end
+        visitor
+      end
+    end
+    def projects
+      @projects ||= @page.search('#projects .project').map do |node|
+        project = {}
+        start_date, end_date = node.at('.date-range').text.gsub(/\s+|\n/, ' ').strip.split(' – ') rescue nil
+        project[:title] = node.at('.item-title').text
+        project[:link] = CGI.parse(URI.parse(node.at('.item-title a')['href']).query)['url'][0] rescue nil
+        project[:start_date] = parse_date(start_date) rescue nil
+        project[:end_date] = parse_date(end_date) rescue nil
+        project[:description] = node.at('.description').children().to_s rescue nil
+        project[:associates] = node.search('.contributors .contributor').map { |c| c.at('a').text } rescue nil
+        project
+      end
+    end
+    def to_json
+      require 'json'
+      ATTRIBUTES.reduce({}) { |hash, attr| hash[attr.to_sym] = self.send(attr.to_sym); hash }.to_json
+    end
+    private
+    def get_companies
+      if @companies
+        return @companies
+      else
+        @companies = []
+      end
+      @page.search('.positions .position').each do |node|
+        company = {}
+        company[:title] = node.at('.item-title').text.gsub(/\s+|\n/, ' ').strip if node.at('.item-title')
+        company[:company] = node.at('.item-subtitle').text.gsub(/\s+|\n/, ' ').strip if node.at('.item-subtitle')
+        company[:location] = node.at('.location').text if node.at('.location')
+        company[:description] = node.at('.description').text.gsub(/\s+|\n/, ' ').strip if node.at('.description')
+        company[:company_logo] = node.at('.logo a img').first[1] if node.at('.logo')
+        start_date, end_date = node.at('.date-range').text.strip.split(' – ') rescue nil
+        company[:duration] = node.at('.date-range').text[/.*\((.*)\)/, 1]
+        company[:start_date] = parse_date(start_date) rescue nil
+        if end_date && end_date.match(/Present/)
+          company[:end_date] = 'Present'
+        else
+          company[:end_date] = parse_date(end_date) rescue nil
+        end
+        company_link = node.at('.item-subtitle').at('a')['href'] rescue nil
+        if @options[:company_details] && company_link
+          company.merge!(get_company_details(company_link))
+        end
+        @companies << company
+      end
+      @companies
+    end
+    def parse_date(date)
+      date = '#{date}-01-01' if date =~ /^(19|20)\d{2}$/
+      Date.parse(date)
+    end
+    def get_company_details(link)
+      sleep(1.5)
+      parsed = URI::parse(get_linkedin_company_url(link))
+      parsed.fragment = parsed.query = nil
+      result = { linkedin_company_url: parsed.to_s }
+      page = http_client.get(parsed.to_s)
+      company_details = JSON.parse(page.at('#stream-footer-embed-id-content').children.first.text) rescue nil
+      if company_details
+        result[:website] = company_details['website']
+        result[:description] = company_details['description']
+        result[:company_size] = company_details['size']
+        result[:type] = company_details['companyType']
+        result[:industry] = company_details['industry']
+        result[:founded] = company_details['yearFounded']
+        headquarters = company_details['headquarters']
+        if headquarters
+          result[:address] = %{#{headquarters['street1']} #{headquarters['street2']} #{headquarters['city']}, #{headquarters['state']} #{headquarters['zip']} #{headquarters['country']}}
+        end
+        [:street1, :street2, :city, :zip, :state, :country].each do |section|
+          result[section] = headquarters[section.to_s]
+        end
+      end
+      result
+    end
+    def http_client
+      @http_client ||= Mechanize.new do |agent|
+        agent.user_agent = RandomUserAgent.randomize
+        if !@options.empty?
+          agent.set_proxy(@options[:proxy_ip], @options[:proxy_port], @options[:username], @options[:password])
+        end
+        agent.max_history = 0
+      end
+    end
+    def get_linkedin_company_url(link)
+      http = %r{http://www.linkedin.com/}
+      https = %r{https://www.linkedin.com/}
+      if http.match(link) || https.match(link)
+        link
+      else
+        "http://www.linkedin.com/#{link}"
+      end
+    end
+  end
+end

data/lib/{linkedin_scraper → linkedin-scraper}/version.rb RENAMED Viewed

@@ -1,5 +1,5 @@
 module Linkedin
   module Scraper
-    VERSION = '1.1.0'
+    VERSION = '2.1.0'
   end
 end

data/linkedin-scraper.gemspec CHANGED Viewed

@@ -1,5 +1,5 @@
 # -*- encoding: utf-8 -*-
-require File.expand_path('../lib/linkedin_scraper/version', __FILE__)
+require File.expand_path('../lib/linkedin-scraper/version', __FILE__)
 Gem::Specification.new do |gem|
   gem.authors       = ['Yatish Mehta']

data/spec/linkedin_scraper/profile_spec.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 # encoding: UTF-8
 require 'spec_helper'
-require 'linkedin_scraper'
+require 'linkedin-scraper'
 describe Linkedin::Profile do
   # This is the HTML of https://www.linkedin.com/in/jeffweiner08
@@ -59,12 +59,6 @@ describe Linkedin::Profile do
     end
   end
-  describe '#groups' do
-    it "returns list of profile's groups" do
-      p profile.groups
-    end
-  end
   describe '#name' do
     it 'returns the first and last name of the profile' do
       expect(profile.name).to eq "Jeff Weiner"

data/spec/spec_helper.rb CHANGED Viewed

@@ -1,17 +1,8 @@
 $LOAD_PATH << File.join(File.dirname(__FILE__), '../lib')
-# This file was generated by the `rspec --init` command. Conventionally, all
-# specs live under a `spec` directory, which RSpec adds to the `$LOAD_PATH`.
-# Require this file using `require "spec_helper"` to ensure that it is only
-# loaded once.
-#
-# See http://rubydoc.info/gems/rspec-core/RSpec/Core/Configuration
 RSpec.configure do |config|
   config.run_all_when_everything_filtered = true
   config.filter_run :focus
-  # Run specs in random order to surface order dependencies. If you find an
-  # order dependency and want to debug it, you can fix the order by providing
-  # the seed, which is printed after each run.
-  #     --seed 1234
   config.order = 'random'
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: linkedin-scraper
 version: !ruby/object:Gem::Version
-  version: 1.1.0
+  version: 2.1.0
 platform: ruby
 authors:
 - Yatish Mehta
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-05-06 00:00:00.000000000 Z
+date: 2016-05-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: mechanize
@@ -76,14 +76,15 @@ files:
 - ".gitignore"
 - ".rubocop.yml"
 - ".travis.yml"
+- CHANGE.md
 - Gemfile
 - LICENSE
 - README.md
 - Rakefile
 - bin/linkedin-scraper
-- lib/linkedin_scraper.rb
-- lib/linkedin_scraper/profile.rb
-- lib/linkedin_scraper/version.rb
+- lib/linkedin-scraper.rb
+- lib/linkedin-scraper/profile.rb
+- lib/linkedin-scraper/version.rb
 - linkedin-scraper.gemspec
 - spec/fixtures/jeffweiner08.html
 - spec/linkedin_scraper/.DS_Store
@@ -109,7 +110,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.4.5.1
+rubygems_version: 2.6.4
 signing_key:
 specification_version: 4
 summary: when a url of  public linkedin profile page is given it scrapes the entire

data/lib/linkedin_scraper/profile.rb DELETED Viewed

@@ -1,265 +0,0 @@
-# -*- encoding: utf-8 -*-
-module Linkedin
-  class Profile
-    ATTRIBUTES = %w(
-    name
-    first_name
-    last_name
-    title
-    location
-    number_of_connections
-    country
-    industry
-    summary
-    picture
-    projects
-    linkedin_url
-    education
-    groups
-    websites
-    languages
-    skills
-    certifications
-    organizations
-    past_companies
-    current_companies
-    recommended_visitors)
-    attr_reader :page, :linkedin_url
-    # support old version
-    def self.get_profile(url, options = {})
-      Linkedin::Profile.new(url, options)
-    rescue => e
-      puts e
-    end
-    def initialize(url, options = {})
-      @linkedin_url = url
-      @options = options
-      @page = http_client.get(url)
-    end
-    def name
-      "#{first_name} #{last_name}"
-    end
-    def first_name
-      @first_name ||= (@page.at(".fn").text.split(" ", 2)[0].strip if @page.at(".fn"))
-    end
-    def last_name
-      @last_name ||= (@page.at(".fn").text.split(" ", 2)[1].strip if @page.at(".fn"))
-    end
-    def title
-      @title ||= (@page.at(".title").text.gsub(/\s+/, " ").strip if @page.at(".title"))
-    end
-    def location
-      @location ||= (@page.at(".locality").text.split(",").first.strip if @page.at(".locality"))
-    end
-    def number_of_connections
-      @connections ||= (@page.at(".member-connections").text.match(/[0-9]+[\+]{0,1}/)[0]) if @page.at(".member-connections")
-    end
-    def country
-      @country ||= (@page.at(".locality").text.split(",").last.strip if @page.at(".locality"))
-    end
-    def industry
-      @industry ||= (@page.search("#demographics .descriptor")[-1].text.gsub(/\s+/, " ").strip if @page.at("#demographics .descriptor"))
-    end
-    def summary
-      @summary ||= (@page.at("#summary .description").text.gsub(/\s+/, " ").strip if @page.at("#summary .description"))
-    end
-    def picture
-      @picture ||= (@page.at('.profile-picture img').attributes.values_at('src','data-delayed-url').compact.first.value.strip if @page.at('.profile-picture img'))
-    end
-    def skills
-      @skills ||= (@page.search(".pills .skill:not(.see-less)").map { |skill| skill.text.strip if skill.text } rescue nil)
-    end
-    def past_companies
-      @past_companies ||= get_companies().reject { |c| c[:end_date] == "Present"}
-    end
-    def current_companies
-      @current_companies ||= get_companies().find_all{ |c| c[:end_date] == "Present"}
-    end
-    def education
-      @education ||= @page.search(".schools .school").map do |item|
-        name = item.at("h4").text.gsub(/\s+|\n/, " ").strip if item.at("h4")
-        desc = item.search("h5").last.text.gsub(/\s+|\n/, " ").strip if item.search("h5").last
-        degree = item.search("h5").last.at(".degree").text.gsub(/\s+|\n/, " ").strip.gsub(/,$/, "") if item.search("h5").last.at(".degree")
-        major = item.search("h5").last.at(".major").text.gsub(/\s+|\n/, " ").strip      if item.search("h5").last.at(".major")
-        period = item.at(".date-range").text.gsub(/\s+|\n/, " ").strip if item.at(".date-range")
-        start_date, end_date = item.at(".date-range").text.gsub(/\s+|\n/, " ").strip.split(" – ") rescue nil
-        {:name => name, :description => desc, :degree => degree, :major => major, :period => period, :start_date => start_date, :end_date => end_date }
-      end
-    end
-    def websites
-      @websites ||= @page.search(".websites li").flat_map do |site|
-        url = site.at("a")["href"]
-        CGI.parse(URI.parse(url).query)["url"]
-      end
-    end
-    def groups
-      @groups ||= @page.search("#groups .group .item-title").map do |item|
-        name = item.text.gsub(/\s+|\n/, " ").strip
-        link = item.at("a")['href']
-        { :name => name, :link => link }
-      end
-    end
-    def organizations
-      @organizations ||= @page.search("#background-organizations .section-item").map do |item|
-        name = item.at(".summary").text.gsub(/\s+|\n/, " ").strip rescue nil
-        start_date, end_date = item.at(".organizations-date").text.gsub(/\s+|\n/, " ").strip.split(" – ") rescue nil
-        start_date = Date.parse(start_date) rescue nil
-        end_date = Date.parse(end_date)   rescue nil
-        { :name => name, :start_date => start_date, :end_date => end_date }
-      end
-    end
-    def languages
-      @languages ||= @page.search(".background-languages #languages ol li").map do |item|
-        language = item.at("h4").text rescue nil
-        proficiency = item.at("div.languages-proficiency").text.gsub(/\s+|\n/, " ").strip rescue nil
-        { :language => language, :proficiency => proficiency }
-      end
-    end
-    def certifications
-      @certifications ||= @page.search("background-certifications").map do |item|
-        name       = item.at("h4").text.gsub(/\s+|\n/, " ").strip rescue nil
-        authority  = item.at("h5").text.gsub(/\s+|\n/, " ").strip rescue nil
-        license    = item.at(".specifics/.licence-number").text.gsub(/\s+|\n/, " ").strip rescue nil
-        start_date = item.at(".certification-date").text.gsub(/\s+|\n/, " ").strip rescue nil
-        { :name => name, :authority => authority, :license => license, :start_date => start_date }
-      end
-    end
-    def recommended_visitors
-      @recommended_visitors ||= @page.search(".insights .browse-map/ul/li.profile-card").map do |visitor|
-        v = {}
-        v[:link] = visitor.at("a")["href"]
-        v[:name] = visitor.at("h4/a").text
-        if visitor.at(".headline")
-          v[:title] = visitor.at(".headline").text.gsub("...", " ").split(" at ").first
-          v[:company] = visitor.at(".headline").text.gsub("...", " ").split(" at ")[1]
-        end
-        v
-      end
-    end
-    def projects
-      @projects ||= @page.search("#projects .project").map do |project|
-        p = {}
-        start_date, end_date = project.at(".date-range").text.gsub(/\s+|\n/, " ").strip.split(" – ") rescue nil
-        p[:title] = project.at(".item-title").text
-        p[:link] =  CGI.parse(URI.parse(project.at(".item-title a")['href']).query)["url"][0] rescue nil
-        p[:start_date] = parse_date(start_date) rescue nil
-        p[:end_date] = parse_date(end_date)  rescue nil
-        p[:description] = project.at(".description").children().to_s rescue nil
-        p[:associates] = project.search(".contributors .contributor").map{ |c| c.at("a").text } rescue nil
-        p
-      end
-    end
-    def to_json
-      require "json"
-      ATTRIBUTES.reduce({}){ |hash,attr| hash[attr.to_sym] = self.send(attr.to_sym);hash }.to_json
-    end
-    private
-    #TODO Bad code Hot fix
-    def get_companies()
-      if @companies
-        return @companies
-      else
-        @companies = []
-      end
-      @page.search(".positions .position").each do |node|
-        company = {}
-        company[:title] = node.at(".item-title").text.gsub(/\s+|\n/, " ").strip if node.at(".item-title")
-        company[:company] = node.at(".item-subtitle").text.gsub(/\s+|\n/, " ").strip if node.at(".item-subtitle")
-        company[:location] = node.at(".location").text if node.at(".location")
-        company[:description] = node.at(".description").text.gsub(/\s+|\n/, " ").strip if node.at(".description")
-        company[:company_logo] = node.at(".logo a img").first[1] if node.at(".logo")
-        start_date, end_date = node.at(".date-range").text.strip.split(" – ") rescue nil
-        company[:duration] = node.at(".date-range").text[/.*\((.*)\)/, 1]
-        company[:start_date] = parse_date(start_date) rescue nil
-        if end_date && end_date.match(/Present/)
-          company[:end_date] = "Present"
-        else
-          company[:end_date] = parse_date(end_date) rescue nil
-        end
-        company_link = node.at(".item-subtitle").at("a")["href"] rescue nil
-        if company_link
-          result = get_company_details(company_link)
-          @companies << company.merge!(result)
-        else
-          @companies << company
-        end
-      end
-      @companies
-    end
-    def parse_date(date)
-      date = "#{date}-01-01" if date =~ /^(19|20)\d{2}$/
-      Date.parse(date)
-    end
-    def get_company_details(link)
-      result = { :linkedin_company_url => get_linkedin_company_url(link) }
-      page = http_client.get(result[:linkedin_company_url])
-      result[:url] = page.at(".basic-info-about/ul/li/p/a").text if page.at(".basic-info-about/ul/li/p/a")
-      node_2 = page.at(".basic-info-about/ul")
-      if node_2
-        node_2.search("p").zip(node_2.search("h4")).each do |value, title|
-          result[title.text.gsub(" ", "_").downcase.to_sym] = value.text.strip
-        end
-      end
-      result[:address] = page.at(".vcard.hq").at(".adr").text.gsub("\n", " ").strip if page.at(".vcard.hq")
-      result
-    end
-    def http_client
-      Mechanize.new do |agent|
-        agent.user_agent = RandomUserAgent.randomize
-        unless @options.empty?
-          agent.set_proxy(@options[:proxy_ip], @options[:proxy_port], @options[:username], @options[:password])
-        end
-        agent.max_history = 0
-      end
-    end
-    def get_linkedin_company_url(link)
-      http = %r{http://www.linkedin.com/}
-      https = %r{https://www.linkedin.com/}
-      if http.match(link) || https.match(link)
-        link
-      else
-        "http://www.linkedin.com/#{link}"
-      end
-    end
-  end
-end