jobs-former_students 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: c9cbfaa0cba18e517da45135e3dc0fe923b84be4
4
+ data.tar.gz: 1c6025da1723cad63f686ed02e224ad051881268
5
+ SHA512:
6
+ metadata.gz: 016a0fc67bbfbcdcd4b4d18581875df377b8dc26f6c31ade2a207b9b123226cd37a003ad40e2d4c0aa78e54681597fe9b5f7025312c55ec12c6b0e319e8be0fc
7
+ data.tar.gz: 14e0abbe226512dee4bb8d7548db933cb0d7f40536e1fe844cced0f30aec036c2f35e9496d5789a477a39efd77bc4fe8f2fc8ea8259eb10b5fb83eddd362d96f
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ *.bundle
11
+ *.so
12
+ *.o
13
+ *.a
14
+ mkmf.log
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in jobs-former_students.gemspec
4
+ gemspec
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2014 Jim Sutton
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,78 @@
1
+ # Jobs::FormerStudents
2
+
3
+ #####This is a small gem intended for use with the [Turing School]('https://turing.io') jobs-basket gem. It returns a list of all companies that have employed or currently employ former gSchool students as indicated by the gSchool website.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ ```ruby
10
+ gem 'jobs-former_students'
11
+ ```
12
+
13
+ And then execute:
14
+
15
+ $ bundle
16
+
17
+ Or install it yourself as:
18
+
19
+ $ gem install jobs-former_students
20
+
21
+ ## Basic Usage
22
+ This gem provides instance methods to return single company names or an array of all company names. Start by creating a new instance of the Scraper class:
23
+ ```ruby
24
+ scraper = Scraper.new
25
+ ```
26
+
27
+ To return a full, current list of company names, pass in a range limit to the `get_company_names` method:
28
+ ```ruby
29
+ scraper.get_company_names(80)
30
+ ```
31
+
32
+ The above method will search students 1 through 80 and return an array of all companies found. As of the publishing date of Version 0.0.1 of this gem, student 65 is the last student with a valid company name. Pages after this represent either pages that do not exist or students who have incomplete profiles (i.e. no company listed).
33
+
34
+ It is possible (though less useful) to search for a single company. This requires you to know which id is associated with the student, and probably means you can find this information by simply browsing the site yourself. But the world is your oyster, so ` scraper.get_company_name(12) ` will return the single company name of the student matching the number you pass in.
35
+
36
+
37
+ ##Sorting Methods
38
+ The gem provides three methods to sort the output provided by the `get_company_names` methods.
39
+
40
+ #####1. To return an array of unique company names, use the following method:
41
+
42
+ ```ruby
43
+ scraper.get_unique_company_names(80)
44
+ ```
45
+
46
+ #####2. To return a hash showing the total number of students employed at each company, use the `total_students_employed_per_company` method:
47
+
48
+ ```ruby
49
+ scraper.total_students_employed_per_company(80)
50
+ ```
51
+
52
+ **Example output:**
53
+ ```ruby
54
+ {"Welltok"=>1, "Zayo"=>4, "Active Junky"=>1, "Acumen Digital"=>4, "Kapost"=>1, "Verbalize.it"=>1, "Galvanize / gSchool"=>3, "TeamSnap"=>1, "Sports Shares"=>1, "Pivotal Labs"=>1, "Keen.io"=>1, "Blogmutt"=>1, "QuickLeft"=>2, "IBM BlueMix Garage"=>2, "HobbyDB"=>1, "Oildecks"=>1, "Lee Reedy"=>1, "Dfuzr"=>1, "Colorado Access"=>2, "CloudElements"=>1, "P2Binvestor"=>1, "gSchool"=>3, "Haught Codeworks"=>1, "Amex"=>2, "LegitScript"=>1, "RxRevu"=>1, "Apto"=>1, "Mondo Robot"=>1, "JayBird"=>1}
55
+ ```
56
+
57
+ You can query the results of this method by passing in a company name as a hash key:
58
+ ```ruby
59
+ results = scraper.total_students_employed_per_company(80)
60
+ results["Zayo"] # => 4
61
+ ```
62
+
63
+ #####3. To return a sorted list of these results, use the `sort_by_frequency` method:
64
+
65
+ `scraper.sort_by_frequency(80)`
66
+
67
+ **Example output:**
68
+ ```ruby
69
+ [["Zayo", 4], ["Acumen Digital", 4], ["Galvanize / gSchool", 3], ["gSchool", 3], ["IBM BlueMix Garage", 2], ["Amex", 2], ["Colorado Access", 2], ["QuickLeft", 2], ["HobbyDB", 1], ["CloudElements", 1], ["Haught Codeworks", 1], ["Dfuzr", 1], ["Lee Reedy", 1], ["Oildecks", 1], ["P2Binvestor", 1], ["LegitScript", 1], ["RxRevu", 1], ["Blogmutt", 1], ["Keen.io", 1], ["Pivotal Labs", 1], ["Sports Shares", 1], ["TeamSnap", 1], ["Apto", 1], ["Verbalize.it", 1], ["Kapost", 1], ["Mondo Robot", 1], ["Active Junky", 1], ["JayBird", 1], ["Welltok", 1]]
70
+ ```
71
+
72
+ ## Contributing
73
+
74
+ 1. Fork it ( https://github.com/[my-github-username]/jobs-former_students/fork )
75
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
76
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
77
+ 4. Push to the branch (`git push origin my-new-feature`)
78
+ 5. Create a new Pull Request
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+
@@ -0,0 +1,28 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'jobs/former_students/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "jobs-former_students"
8
+ spec.version = Jobs::FormerStudents::VERSION
9
+ spec.authors = ["Jim Sutton"]
10
+ spec.email = ["jimsuttonjimsutton@gmail.com"]
11
+ spec.summary = "Small gem to scrape company names from former students."
12
+ spec.description = "Small gem to scrape company names from former students."
13
+ spec.homepage = "http://github.com/turingschool/jobs-former_students"
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0")
17
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_development_dependency "bundler", "~> 1.7"
22
+ spec.add_development_dependency "rake", "~> 10.0"
23
+ spec.add_development_dependency "minitest"
24
+ spec.add_development_dependency "pry"
25
+
26
+ spec.add_dependency "mechanize", "~> 2.7"
27
+
28
+ end
@@ -0,0 +1,8 @@
1
+ require "jobs/former_students/version"
2
+ require "jobs/former_students/scraper"
3
+
4
+ module Jobs
5
+ module FormerStudents
6
+
7
+ end
8
+ end
@@ -0,0 +1,58 @@
1
+ require 'mechanize'
2
+
3
+ module Jobs
4
+ module FormerStudents
5
+ class Scraper
6
+ attr_reader :companies, :agent
7
+
8
+ def initialize
9
+ @companies = []
10
+ @agent = Mechanize.new
11
+ end
12
+
13
+ def url_to_scrape(uri)
14
+ base_url = 'https://students.gschool.it/students/'
15
+ agent.get( base_url + "#{uri}" )
16
+ end
17
+
18
+ def get_company_name(student_number)
19
+ page = url_to_scrape(student_number)
20
+ company_name = page.at('.hero-subtitle > h3').text.strip
21
+ companies.push(company_name) unless company_name.empty?
22
+ puts "Company at /students/" + "#{student_number} is: " + "#{company_name}"
23
+ companies
24
+ rescue Mechanize::ResponseCodeError
25
+ puts "404 error at /students" + "#{student_number}"
26
+ return companies
27
+ end
28
+
29
+ def get_company_names(total_students)
30
+ i = 1
31
+ until i == total_students
32
+ get_company_name(i)
33
+ i += 1
34
+ end
35
+ companies.find_all { |co| !co.empty? }
36
+ end
37
+
38
+ def get_unique_company_names(total_students)
39
+ get_company_names(total_students).uniq
40
+ end
41
+
42
+ def total_students_employed_per_company(total_students)
43
+ companies = get_company_names(total_students)
44
+ companies.inject(Hash.new(0)) do |h,v|
45
+ h[v] += 1
46
+ h
47
+ end
48
+ end
49
+
50
+ def sort_by_frequency(total_students)
51
+ frequency = total_students_employed_per_company(total_students)
52
+ frequency.sort_by { |k,v| v }.reverse
53
+ end
54
+
55
+ end
56
+ end
57
+ end
58
+
@@ -0,0 +1,5 @@
1
+ module Jobs
2
+ module FormerStudents
3
+ VERSION = "0.0.1"
4
+ end
5
+ end
@@ -0,0 +1,56 @@
1
+ require 'minitest/autorun'
2
+ require 'minitest/pride'
3
+ require './lib/jobs/former_students/scraper'
4
+ require 'mechanize'
5
+
6
+ class ScraperTest < Minitest::Test
7
+
8
+ def test_it_exists
9
+ assert Jobs::FormerStudents::Scraper
10
+ end
11
+
12
+ def test_it_returns_company_name
13
+ scraper = Jobs::FormerStudents::Scraper.new
14
+ results = scraper.get_company_name(12)
15
+ assert_equal ["Acumen Digital"], results
16
+ end
17
+
18
+ def test_it_returns_empty_array_if_matching_selector_is_not_on_page
19
+ scraper = Jobs::FormerStudents::Scraper.new
20
+ result = scraper.get_company_name(80) # this page exists but with no information
21
+ assert_equal [], result
22
+ end
23
+
24
+ def test_it_returns_empty_string_if_matching_selector_is_not_on_page
25
+ scraper = Jobs::FormerStudents::Scraper.new
26
+ result = scraper.get_company_name(2) # this page does not exist / is a 404 page
27
+ assert_equal [], result
28
+ end
29
+
30
+ def test_it_creates_array_of_company_names
31
+ scraper = Jobs::FormerStudents::Scraper.new
32
+ results = scraper.get_unique_company_names(90) # Searches '/students/1' through '/students/90'
33
+ refute results.nil?
34
+ assert results.kind_of?(Array)
35
+ refute results.any? { |result| result.empty?}
36
+ end
37
+
38
+ def test_it_returns_only_unique_names
39
+ scraper = Jobs::FormerStudents::Scraper.new
40
+ results = scraper.get_unique_company_names(30) # There are 4 students at Zayo in this search group
41
+ zayo = results.find_all {|co| co == "Zayo" }
42
+ refute zayo.count > 1
43
+ end
44
+
45
+ def test_it_totals_students_per_company
46
+ scraper = Jobs::FormerStudents::Scraper.new
47
+ results = scraper.total_students_employed_per_company(80) # There are 4 students at Zayo in this search group
48
+ assert_equal 4, results["Zayo"]
49
+ end
50
+
51
+ def test_it_sorts_by_frequency
52
+ scraper = Jobs::FormerStudents::Scraper.new
53
+ results = scraper.sort_by_frequency(80)
54
+ assert_equal ["Zayo", 4], results.first
55
+ end
56
+ end
metadata ADDED
@@ -0,0 +1,125 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: jobs-former_students
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Jim Sutton
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2014-12-25 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.7'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.7'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: minitest
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: pry
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: mechanize
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '2.7'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '2.7'
83
+ description: Small gem to scrape company names from former students.
84
+ email:
85
+ - jimsuttonjimsutton@gmail.com
86
+ executables: []
87
+ extensions: []
88
+ extra_rdoc_files: []
89
+ files:
90
+ - ".gitignore"
91
+ - Gemfile
92
+ - LICENSE.txt
93
+ - README.md
94
+ - Rakefile
95
+ - jobs-former_students.gemspec
96
+ - lib/jobs/former_students.rb
97
+ - lib/jobs/former_students/scraper.rb
98
+ - lib/jobs/former_students/version.rb
99
+ - test/jobs/former_students/scraper_test.rb
100
+ homepage: http://github.com/turingschool/jobs-former_students
101
+ licenses:
102
+ - MIT
103
+ metadata: {}
104
+ post_install_message:
105
+ rdoc_options: []
106
+ require_paths:
107
+ - lib
108
+ required_ruby_version: !ruby/object:Gem::Requirement
109
+ requirements:
110
+ - - ">="
111
+ - !ruby/object:Gem::Version
112
+ version: '0'
113
+ required_rubygems_version: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ requirements: []
119
+ rubyforge_project:
120
+ rubygems_version: 2.4.2
121
+ signing_key:
122
+ specification_version: 4
123
+ summary: Small gem to scrape company names from former students.
124
+ test_files:
125
+ - test/jobs/former_students/scraper_test.rb