gtfs_stops_clustering 0.1.2 → 0.1.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 32c396f8f9377660da84a4b29d78657e23efcfefe01e6d05368894dd1ffbf07d
4
- data.tar.gz: 42f3961fb4a896fb3f25c3f113672a0ac8894d74458270bfce2fef82f5e50e11
3
+ metadata.gz: bd1d3d49ce47faac98cf22d674adab3d37d0afc1a0b7b55ecba36fe5cde3bac3
4
+ data.tar.gz: 3b6572e53a4268c2e8def8dd80e8605768e425dd05570827822b80ce0448a100
5
5
  SHA512:
6
- metadata.gz: 197ec5fb775f93c61c1207392bb2b0a136c09b61572327d2855b9f0c73bf1dc94f552b6665d842a757c20e06248692001ff013c23407e3bc94ed7a024ca5dbd0
7
- data.tar.gz: 88c0c3d4106abf305d0ac7a4196a6612f40745bf2632ffadab32673d3fb60f04800a2d4de44027fa9fb4a3c8c03db206881dccc91505fb15e48253e56e85952c
6
+ metadata.gz: a7f35b9d4c35f638b5baac0fa384f50e44db4ef1a20cfc119c74c184b1d900c1207704c1503094fb85c63ea84c9acf283c4cb33e8e7efa7ae13aa5d18e5d5d67
7
+ data.tar.gz: 6b5c325393774c7f926891445c09f82eda11fbdb1b5528942e31284f2c5c946ceada56aabe82911a8167956a933ffce1ec17fb599a71ace3853e24dcbec059a9
data/.rubocop.yml ADDED
@@ -0,0 +1,13 @@
1
+ AllCops:
2
+ TargetRubyVersion: 2.6
3
+
4
+ Style/StringLiterals:
5
+ Enabled: true
6
+ EnforcedStyle: double_quotes
7
+
8
+ Style/StringLiteralsInInterpolation:
9
+ Enabled: true
10
+ EnforcedStyle: double_quotes
11
+
12
+ Layout/LineLength:
13
+ Max: 140
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2023-12-06
4
+
5
+ - Initial release
@@ -0,0 +1,84 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
6
+
7
+ We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
8
+
9
+ ## Our Standards
10
+
11
+ Examples of behavior that contributes to a positive environment for our community include:
12
+
13
+ * Demonstrating empathy and kindness toward other people
14
+ * Being respectful of differing opinions, viewpoints, and experiences
15
+ * Giving and gracefully accepting constructive feedback
16
+ * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
17
+ * Focusing on what is best not just for us as individuals, but for the overall community
18
+
19
+ Examples of unacceptable behavior include:
20
+
21
+ * The use of sexualized language or imagery, and sexual attention or
22
+ advances of any kind
23
+ * Trolling, insulting or derogatory comments, and personal or political attacks
24
+ * Public or private harassment
25
+ * Publishing others' private information, such as a physical or email
26
+ address, without their explicit permission
27
+ * Other conduct which could reasonably be considered inappropriate in a
28
+ professional setting
29
+
30
+ ## Enforcement Responsibilities
31
+
32
+ Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
33
+
34
+ Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
35
+
36
+ ## Scope
37
+
38
+ This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
39
+
40
+ ## Enforcement
41
+
42
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at visconti373@gmail.com. All complaints will be reviewed and investigated promptly and fairly.
43
+
44
+ All community leaders are obligated to respect the privacy and security of the reporter of any incident.
45
+
46
+ ## Enforcement Guidelines
47
+
48
+ Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
49
+
50
+ ### 1. Correction
51
+
52
+ **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
53
+
54
+ **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
55
+
56
+ ### 2. Warning
57
+
58
+ **Community Impact**: A violation through a single incident or series of actions.
59
+
60
+ **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
61
+
62
+ ### 3. Temporary Ban
63
+
64
+ **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
65
+
66
+ **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
67
+
68
+ ### 4. Permanent Ban
69
+
70
+ **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
71
+
72
+ **Consequence**: A permanent ban from any sort of public interaction within the community.
73
+
74
+ ## Attribution
75
+
76
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0,
77
+ available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
78
+
79
+ Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
80
+
81
+ [homepage]: https://www.contributor-covenant.org
82
+
83
+ For answers to common questions about this code of conduct, see the FAQ at
84
+ https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 Visco01
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,39 @@
1
+ # GtfsStopsClustering
2
+
3
+ TODO: Delete this and the text below, and describe your gem
4
+
5
+ Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/gtfs_stops_clustering`. To experiment with that code, run `bin/console` for an interactive prompt.
6
+
7
+ ## Installation
8
+
9
+ TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
10
+
11
+ Install the gem and add to the application's Gemfile by executing:
12
+
13
+ $ bundle add UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG
14
+
15
+ If bundler is not being used to manage dependencies, install the gem by executing:
16
+
17
+ $ gem install UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG
18
+
19
+ ## Usage
20
+
21
+ TODO: Write usage instructions here
22
+
23
+ ## Development
24
+
25
+ After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
26
+
27
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
28
+
29
+ ## Contributing
30
+
31
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/gtfs_stops_clustering. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/gtfs_stops_clustering/blob/main/CODE_OF_CONDUCT.md).
32
+
33
+ ## License
34
+
35
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
36
+
37
+ ## Code of Conduct
38
+
39
+ Everyone interacting in the GtfsStopsClustering project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/gtfs_stops_clustering/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rubocop/rake_task"
5
+
6
+ RuboCop::RakeTask.new
7
+
8
+ task default: :rubocop
@@ -0,0 +1,54 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "lib/gtfs_stops_clustering/version"
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "gtfs_stops_clustering"
7
+ spec.version = GtfsStopsClustering::VERSION
8
+ spec.authors = ["Pietro Visconti"]
9
+ spec.email = ["pietro.visconti2001@gmail.com"]
10
+
11
+ spec.summary = "A gem to read GTFS stops data and create clusters based on coordinates and stop names' similarities."
12
+ spec.description = "A gem to read GTFS stops data and create clusters based on coordinates and stop names' similarities."
13
+ spec.homepage = "https://github.com/Visco01/gtfs_stops_clustering"
14
+ spec.license = "MIT"
15
+ spec.required_ruby_version = ">= 2.6.0"
16
+
17
+ # spec.metadata["allowed_push_host"] = "TODO: Set to your gem server 'https://example.com'"
18
+
19
+ spec.metadata["homepage_uri"] = spec.homepage
20
+ spec.metadata["source_code_uri"] = spec.homepage
21
+ spec.metadata["changelog_uri"] = "https://github.com/Visco01/gtfs_stops_clustering/blob/main/CHANGELOG.md"
22
+
23
+ # Specify which files should be added to the gem when it is released.
24
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
25
+ spec.files = Dir.chdir(__dir__) do
26
+ `git ls-files -z`.split("\x0").reject do |f|
27
+ (File.expand_path(f) == __FILE__) ||
28
+ f.start_with?(*%w[bin/ test/ spec/ features/ .git .github appveyor Gemfile])
29
+ end
30
+ end
31
+ spec.bindir = "exe"
32
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
33
+ spec.require_paths = ["lib"]
34
+
35
+ # spec.files = ["lib/gtfs_stops_clustering.rb", "lib/gtfs_stops_clustering/data_import.rb", "lib/gtfs_stops_clustering/dbscan.rb",
36
+ # "lib/gtfs_stops_clustering/redis_geodata.rb", "lib/gtfs_stops_clustering/version.rb",
37
+ # "lib/gtfs_stops_clustering/input_consistency_checks.rb"]
38
+
39
+ spec.add_runtime_dependency "csv", "~> 3.2", ">= 3.2.8"
40
+ spec.add_runtime_dependency "distance_measures", "~> 0.0.6"
41
+ spec.add_runtime_dependency "geocoder", "~> 1.8", ">= 1.8.2"
42
+ spec.add_runtime_dependency "gtfs", "~> 0.4.1"
43
+ spec.add_runtime_dependency "redis", "~> 5.0", ">= 5.0.8"
44
+ spec.add_runtime_dependency "text", "~> 1.3", ">= 1.3.1"
45
+
46
+ spec.add_development_dependency "minitest", "~> 5.20"
47
+ spec.add_development_dependency "pry", "~> 0.14.2"
48
+
49
+ # Uncomment to register a new dependency of your gem
50
+ # spec.add_dependency "example-gem", "~> 1.0"
51
+
52
+ # For more information and examples about making a new gem, check out our
53
+ # guide at: https://bundler.io/guides/creating_gem.html
54
+ end
@@ -1,12 +1,15 @@
1
+ # frozen_string_literal: true
2
+
1
3
  # lib/data_import.rb
2
4
 
3
- require 'csv'
4
- require 'gtfs'
5
+ require "csv"
6
+ require "gtfs"
5
7
 
8
+ # DataImport module
6
9
  module DataImport
7
- VERSION='0.0.1'
8
10
  attr_accessor :data_import
9
11
 
12
+ # DataImport class
10
13
  class DataImport
11
14
  attr_accessor :stops, :stops_config_file, :stops_names, :stops_corner_cases, :stops_data, :stops_redis_geodata
12
15
 
@@ -22,13 +25,13 @@ module DataImport
22
25
  end
23
26
 
24
27
  def import_stops_corner_cases
25
- if File.exist?(@stops_config_file)
26
- CSV.foreach(@stops_config_file, headers: true) do |row|
27
- stop_name = row['stop_name']
28
- cluster_name = row['cluster_name']
28
+ return unless File.exist?(@stops_config_file)
29
+
30
+ CSV.foreach(@stops_config_file, headers: true) do |row|
31
+ stop_name = row["stop_name"]
32
+ cluster_name = row["cluster_name"]
29
33
 
30
- stops_corner_cases << { stop_name: stop_name, cluster_name: cluster_name }
31
- end
34
+ stops_corner_cases << { stop_name: stop_name, cluster_name: cluster_name }
32
35
  end
33
36
  end
34
37
 
@@ -38,13 +41,20 @@ module DataImport
38
41
  longitude = row.lon
39
42
  stop_name = row.name
40
43
 
41
- stop_name = @stops_corner_cases.find { |entry| entry[:stop_name] == stop_name }[:cluster_name] if stops_corner_cases.find { |entry| entry[:stop_name] == stop_name }
44
+ stop_name = stop_name_from_corner_cases(stop_name)
42
45
 
43
46
  @stops_names << stop_name
44
47
  @stops_data << [latitude, longitude]
45
48
  @stops_redis_geodata << [longitude, latitude, "#{longitude},#{latitude}"]
46
49
  end
47
50
  end
51
+
52
+ def stop_name_from_corner_cases(stop_name)
53
+ csv_entry = @stops_corner_cases.find do |entry|
54
+ entry[:stop_name] == stop_name
55
+ end
56
+ csv_entry.nil? ? stop_name : csv_entry[:cluster_name]
57
+ end
48
58
  end
49
59
 
50
60
  def import_stops_data(*args)
@@ -56,5 +66,3 @@ module DataImport
56
66
  }
57
67
  end
58
68
  end
59
-
60
- include DataImport
@@ -1,33 +1,45 @@
1
- ## https://github.com/shiguodong/dbscan (fork)
1
+ # lib/gtfs_stops_clustering/dbscan.rb
2
2
 
3
- require 'distance_measures'
4
- require 'text'
5
- require 'geocoder'
6
- require_relative 'redis_geodata'
3
+ require "distance_measures"
4
+ require "text"
5
+ require "geocoder"
6
+ require_relative "redis_geodata"
7
7
 
8
+ # Array class
8
9
  class Array
9
- def haversine_distance2(n)
10
- Geocoder::Calculations.distance_between(self, n)
10
+ def haversine_distance2(other)
11
+ Geocoder::Calculations.distance_between(self, other)
11
12
  end
12
13
  end
13
14
 
15
+ # DBSCAN module
14
16
  module DBSCAN
17
+ # Clusterer class
15
18
  class Clusterer
19
+ include RedisGeodata
16
20
  attr_accessor :points, :options, :clusters
17
21
 
18
22
  def initialize(points, stops_redis_geodata, options = {})
19
23
  options[:distance] = :euclidean_distance unless options[:distance]
20
24
  options[:labels] = [] unless options[:labels]
21
25
 
22
- c = 0
23
26
  redis_geodata_import(stops_redis_geodata, options[:epsilon])
24
- @points = points.map { |e| po = Point.new(e, options[:labels][c]); c +=1; po }
25
27
  @options = options
26
- @clusters = {-1 => []}
28
+ init_points(points)
29
+ @clusters = { -1 => [] }
27
30
 
28
31
  clusterize!
29
32
  end
30
33
 
34
+ def init_points(points)
35
+ c = 0
36
+ @points = points.map do |e|
37
+ po = Point.new(e, @options[:labels][c])
38
+ c += 1
39
+ po
40
+ end
41
+ end
42
+
31
43
  def clusterize!
32
44
  current_cluster = -1
33
45
  @points.each do |point|
@@ -49,10 +61,10 @@ module DBSCAN
49
61
  # Get Cluster Position
50
62
  cluster_pos = find_cluster_position(clusters[current_cluster])
51
63
 
52
- clusters[current_cluster].each { |e|
64
+ clusters[current_cluster].each do |e|
53
65
  e.cluster_name = cluster_name
54
66
  e.cluster_pos = cluster_pos
55
- }
67
+ end
56
68
  else
57
69
  clusters[-1].push(point)
58
70
  end
@@ -91,9 +103,11 @@ module DBSCAN
91
103
  neighbors = []
92
104
  geosearch_results = geosearch(point.items[1], point.items[0])
93
105
  geosearch_results.each do |neighbor_pos|
94
- coordinates = neighbor_pos.split(',')
95
- neighbor = @points.find { |point| point.items[0] == coordinates[1] &&
96
- point.items[1] == coordinates[0] }
106
+ coordinates = neighbor_pos.split(",")
107
+ neighbor = @points.find do |elem|
108
+ elem.items[0] == coordinates[1] &&
109
+ elem.items[1] == coordinates[0]
110
+ end
97
111
  next unless neighbor
98
112
 
99
113
  string_distance = Text::Levenshtein.distance(point.label.downcase, neighbor.label.downcase)
@@ -112,9 +126,7 @@ module DBSCAN
112
126
 
113
127
  if new_points.size >= options[:min_points]
114
128
  new_points.each do |p|
115
- unless neighbors.include?(p)
116
- neighbors.push(p)
117
- end
129
+ neighbors.push(p) unless neighbors.include?(p)
118
130
  end
119
131
  end
120
132
  end
@@ -127,32 +139,31 @@ module DBSCAN
127
139
 
128
140
  cluster_points
129
141
  end
130
- end
131
142
 
132
- def find_cluster_name(labels)
133
- words = labels.map { |label| label.strip.split }
134
- common_title = ''
143
+ def find_cluster_name(labels)
144
+ words = labels.map { |label| label.strip.split }
145
+ common_title = ""
135
146
 
136
- # Loop through each word index starting from the first
137
- (0...words.first.length).each do |i|
138
- words_at_index = words.map { |word_list| word_list[i] }
147
+ # Loop through each word index starting from the first
148
+ (0...words.first.length).each do |i|
149
+ words_at_index = words.map { |word_list| word_list[i] }
139
150
 
140
- break unless words_at_index.uniq.length == 1
151
+ break unless words_at_index.uniq.length == 1
141
152
 
142
- common_title += " #{words_at_index.first.capitalize}"
143
- end
144
-
145
- common_title.strip! ? common_title : labels.first
146
- end
153
+ common_title += " #{words_at_index.first.capitalize}"
154
+ end
147
155
 
148
- def find_cluster_position(cluster)
149
- total_lat = cluster.map { |e| e.items[0].to_f }.sum
150
- total_lon = cluster.map { |e| e.items[1].to_f }.sum
151
- avg_lat = total_lat / cluster.size
152
- avg_lon = total_lon / cluster.size
153
- [avg_lat, avg_lon]
156
+ common_title.strip! ? common_title : labels.first
157
+ end
158
+ def find_cluster_position(cluster)
159
+ total_lat = cluster.map { |e| e.items[0].to_f }.sum
160
+ total_lon = cluster.map { |e| e.items[1].to_f }.sum
161
+ avg_lat = total_lat / cluster.size
162
+ avg_lon = total_lon / cluster.size
163
+ [avg_lat, avg_lon]
164
+ end
154
165
  end
155
-
166
+ # Point class
156
167
  class Point
157
168
  attr_accessor :items, :cluster, :visited, :label, :cluster_name, :cluster_pos
158
169
 
@@ -176,5 +187,3 @@ module DBSCAN
176
187
  clusterer.labeled_results
177
188
  end
178
189
  end
179
-
180
- include DBSCAN
@@ -0,0 +1,52 @@
1
+ # frozen_string_literal: true
2
+
3
+ # lib/input_consistency_checks.rb
4
+
5
+ # InputConsistencyChecks module
6
+ module InputConsistencyChecks
7
+ # InputConsistencyChecks class
8
+ class InputConsistencyChecks
9
+ attr_accessor :gtfs_paths, :epsilon, :min_points, :names_similarity, :stops_config_path
10
+
11
+ def initialize(gtfs_paths, epsilon, min_points, names_similarity, stops_config_path)
12
+ @gtfs_paths = gtfs_paths
13
+ @stops_config_path = stops_config_path
14
+ @epsilon = epsilon
15
+ @min_points = min_points
16
+ @names_similarity = names_similarity
17
+ input_consistency_checks
18
+ end
19
+
20
+ def input_consistency_checks
21
+ gtfs_paths_check
22
+ epsilon_check
23
+ min_points_check
24
+ names_similarity_check
25
+ end
26
+
27
+ def gtfs_paths_check
28
+ raise ArgumentError, "gtfs_paths cannot be nil" if @gtfs_paths.nil?
29
+ raise ArgumentError, "gtfs_paths must be an Array" unless @gtfs_paths.is_a?(Array)
30
+ raise ArgumentError, "gtfs_paths must not be empty" if @gtfs_paths.empty?
31
+ end
32
+
33
+ def epsilon_check
34
+ raise ArgumentError, "epsilon must be a Float" unless @epsilon.is_a?(Float)
35
+ raise ArgumentError, "epsilon must be greater than 0" if @epsilon.negative?
36
+ end
37
+
38
+ def min_points_check
39
+ raise ArgumentError, "min_points must be an Integer" unless @min_points.is_a?(Integer)
40
+ raise ArgumentError, "min_points must be greater than 0" if @min_points.negative?
41
+ end
42
+
43
+ def names_similarity_check
44
+ raise ArgumentError, "names_similarity must be a Float" unless @names_similarity.is_a?(Float)
45
+ raise ArgumentError, "names_similarity must be between 0 and 1" if @names_similarity.negative? || @names_similarity > 1
46
+ end
47
+ end
48
+
49
+ def input_consistency_checks(gtfs_paths, epsilon, min_points, names_similarity, stop_config_path)
50
+ @input_consistency_checks = InputConsistencyChecks.new(gtfs_paths, epsilon, min_points, names_similarity, stop_config_path)
51
+ end
52
+ end
@@ -1,29 +1,38 @@
1
- # lib/redis_geodata.rb
2
- require 'redis'
1
+ # frozen_string_literal: true
3
2
 
3
+ # lib/gtfs_stops_clustering/redis_geodata.rb
4
+
5
+ require "redis"
6
+
7
+ # RedisGeodata module
4
8
  module RedisGeodata
5
- VERSION='0.0.1'
6
9
  attr_accessor :redis
7
10
 
11
+ # RedisGeodata class
8
12
  class RedisGeodata
9
13
  attr_accessor :stops, :key, :redis, :epsilon
10
14
 
11
15
  def initialize(stops, epsilon)
12
- @redis = Redis.new(url: 'redis://127.0.0.1:6379')
16
+ begin
17
+ @redis = Redis.new(url: "redis://127.0.0.1:6379")
18
+ rescue Redis::CannotConnectError => e
19
+ raise RuntimeError "Error occurred while connecting to Redis: #{e.message}"
20
+ end
13
21
  @stops = stops
14
- @key = 'stops'
22
+ @key = "stops"
15
23
  @epsilon = epsilon
16
24
  geoadd
17
25
  end
18
26
 
19
27
  def geoadd
28
+ @redis.del(@key)
20
29
  @redis.geoadd(@key, *@stops)
21
30
  @redis.expire(@key, 100_000_0)
22
31
  end
23
32
 
24
33
  def geosearch(longitude, latitude)
25
- list = @redis.georadius(@key, longitude, latitude, @epsilon, 'km')
26
- list.reject! { |point| point == longitude.to_s + "," + latitude.to_s }
34
+ list = @redis.georadius(@key, longitude, latitude, @epsilon, "km")
35
+ list.reject! { |point| point == "#{longitude},#{latitude}" }
27
36
  list
28
37
  end
29
38
  end
@@ -36,5 +45,3 @@ module RedisGeodata
36
45
  @redis.geosearch(*args)
37
46
  end
38
47
  end
39
-
40
- include RedisGeodata
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module GtfsStopsClustering
4
- VERSION = "0.1.2"
4
+ VERSION = "0.1.4"
5
5
  end
@@ -1,47 +1,66 @@
1
- #!/usr/bin/env ruby
1
+ # frozen_string_literal: true
2
+
2
3
  # lib/gtfs_stops_clustering.rb
3
- require 'rubygems'
4
- require 'bundler/setup'
4
+
5
+ require "rubygems"
6
+ require "bundler/setup"
5
7
  require_relative "gtfs_stops_clustering/version"
6
- require 'gtfs'
7
- require 'csv'
8
- require_relative './gtfs_stops_clustering/data_import'
9
- require_relative './gtfs_stops_clustering/dbscan'
8
+ require "gtfs"
9
+ require "csv"
10
+ require_relative "./gtfs_stops_clustering/data_import"
11
+ require_relative "./gtfs_stops_clustering/dbscan"
12
+ require_relative "./gtfs_stops_clustering/input_consistency_checks"
10
13
 
14
+ # GtfsStopClustering module
11
15
  module GtfsStopsClustering
12
- VERSION = GtfsStopsClustering::VERSION
13
16
  attr_accessor :gtfs_stops_clustering
14
17
 
18
+ # GtfsStopsClustering class
15
19
  class GtfsStopsClustering
16
- attr_accessor :clusters, :gtfs_urls, :gtfs_stops, :stops_config_path, :epsilon, :min_points, :names_similarity
20
+ include InputConsistencyChecks
21
+ include DataImport
22
+ include DBSCAN
23
+ attr_accessor :clusters, :gtfs_paths, :gtfs_stops, :stops_config_path, :epsilon, :min_points, :names_similarity
17
24
 
18
- def initialize(gtfs_urls, epsilon, min_points, names_similarity, stops_config_path)
25
+ def initialize(gtfs_paths, epsilon, min_points, names_similarity, stops_config_path)
26
+ @gtfs_paths = gtfs_paths
27
+ @stops_config_path = stops_config_path
28
+ @epsilon = epsilon
29
+ @min_points = min_points
30
+ @names_similarity = names_similarity
31
+ input_consistency_checks(@gtfs_paths, @epsilon, @min_points, @names_similarity, @stops_config_path)
19
32
  @clusters = []
20
- unless gtfs_urls.empty?
21
- @gtfs_paths = gtfs_urls
22
- @stops_config_path = stops_config_path
23
- @epsilon = epsilon
24
- @min_points = min_points
25
- @names_similarity = names_similarity
26
- @gtfs_stops = create_stops_merged
27
- clusterize_stops_csv(@gtfs_stops)
28
- end
33
+ @gtfs_stops = create_stops_merged
34
+ clusterize_stops
29
35
  end
30
36
 
31
37
  def create_stops_merged
32
38
  gtfs_stops = []
33
39
  @gtfs_paths.each do |gtfs_path|
34
- gtfs = GTFS::Source.build(gtfs_path)
35
- gtfs_stops << gtfs.stops
40
+ begin
41
+ gtfs = GTFS::Source.build(gtfs_path)
42
+ gtfs_stops << gtfs.stops
43
+ rescue GTFS::InvalidSourceException => e
44
+ raise IOError "Error occurred while building GTFS from #{gtfs_path}: #{e.message}"
45
+ end
36
46
  end
37
47
  gtfs_stops.flatten
38
48
  end
39
49
 
40
- def clusterize_stops_csv(stops_merged)
41
- data = import_stops_data(stops_merged, @stops_config_path)
42
- @clusters = DBSCAN( data[:stops_data], data[:stops_redis_geodata], :epsilon => @epsilon, :min_points => @min_points, :similarity => @names_similarity, :distance => :haversine_distance2, :labels => data[:stops_names] )
50
+ def clusterize_stops
51
+ data = import_stops_data(@gtfs_stops, @stops_config_path)
52
+ @clusters = DBSCAN(data[:stops_data],
53
+ data[:stops_redis_geodata],
54
+ epsilon: @epsilon,
55
+ min_points: @min_points,
56
+ similarity: @names_similarity,
57
+ distance: :haversine_distance2,
58
+ labels: data[:stops_names])
59
+ map_clustered_stops
60
+ end
43
61
 
44
- @clusters.each do |cluster_id, cluster|
62
+ def map_clustered_stops
63
+ @clusters.each_value do |cluster|
45
64
  cluster.each do |stop|
46
65
  gtfs_stop = @gtfs_stops.find { |e| e.lat == stop[:stop_lat] && e.lon == stop[:stop_lon] }
47
66
  stop[:stop_id] = gtfs_stop.id
@@ -49,24 +68,11 @@ module GtfsStopsClustering
49
68
  stop[:parent_station] = gtfs_stop.parent_station
50
69
  end
51
70
  end
52
-
53
- output_path = 'stop_clusters.txt'
54
- File.open(output_path, 'w') do |file|
55
- @clusters.each do |cluster_id, cluster |
56
- file.puts "Cluster #{cluster_id}"
57
- cluster.each do |point|
58
- file.puts point.inspect
59
- end
60
- file.puts
61
- end
62
- end
63
71
  end
64
72
  end
65
73
 
66
- def build(gtfs_urls, epsilon, min_points, names_similarity = 1, stop_config_path = '')
67
- @gtfs_stops_clustering = GtfsStopsClustering.new(gtfs_urls, epsilon, min_points, names_similarity, stop_config_path)
74
+ def build_clusters(gtfs_paths, epsilon, min_points, names_similarity = 1, stop_config_path = "")
75
+ @gtfs_stops_clustering = GtfsStopsClustering.new(gtfs_paths, epsilon, min_points, names_similarity, stop_config_path)
68
76
  @gtfs_stops_clustering.clusters
69
77
  end
70
78
  end
71
-
72
- include GtfsStopsClustering
@@ -0,0 +1,3 @@
1
+ stop_name,cluster_name
2
+ Example Stop Name,Corrected Stop Name
3
+ Placeholder,Corrected Placeholder
@@ -0,0 +1,4 @@
1
+ module GtfsStopsClustering
2
+ VERSION: String
3
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
4
+ end
metadata CHANGED
@@ -1,29 +1,35 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gtfs_stops_clustering
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.4
5
5
  platform: ruby
6
6
  authors:
7
- - Visco01
7
+ - Pietro Visconti
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2023-12-07 00:00:00.000000000 Z
11
+ date: 2023-12-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: gtfs
14
+ name: csv
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: 0.4.1
19
+ version: '3.2'
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: 3.2.8
20
23
  type: :runtime
21
24
  prerelease: false
22
25
  version_requirements: !ruby/object:Gem::Requirement
23
26
  requirements:
24
27
  - - "~>"
25
28
  - !ruby/object:Gem::Version
26
- version: 0.4.1
29
+ version: '3.2'
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: 3.2.8
27
33
  - !ruby/object:Gem::Dependency
28
34
  name: distance_measures
29
35
  requirement: !ruby/object:Gem::Requirement
@@ -39,85 +45,107 @@ dependencies:
39
45
  - !ruby/object:Gem::Version
40
46
  version: 0.0.6
41
47
  - !ruby/object:Gem::Dependency
42
- name: text
48
+ name: geocoder
43
49
  requirement: !ruby/object:Gem::Requirement
44
50
  requirements:
45
51
  - - "~>"
46
52
  - !ruby/object:Gem::Version
47
- version: '1.3'
53
+ version: '1.8'
48
54
  - - ">="
49
55
  - !ruby/object:Gem::Version
50
- version: 1.3.1
56
+ version: 1.8.2
51
57
  type: :runtime
52
58
  prerelease: false
53
59
  version_requirements: !ruby/object:Gem::Requirement
54
60
  requirements:
55
61
  - - "~>"
56
62
  - !ruby/object:Gem::Version
57
- version: '1.3'
63
+ version: '1.8'
58
64
  - - ">="
59
65
  - !ruby/object:Gem::Version
60
- version: 1.3.1
66
+ version: 1.8.2
61
67
  - !ruby/object:Gem::Dependency
62
- name: geocoder
68
+ name: gtfs
63
69
  requirement: !ruby/object:Gem::Requirement
64
70
  requirements:
65
71
  - - "~>"
66
72
  - !ruby/object:Gem::Version
67
- version: '1.8'
68
- - - ">="
69
- - !ruby/object:Gem::Version
70
- version: 1.8.2
73
+ version: 0.4.1
71
74
  type: :runtime
72
75
  prerelease: false
73
76
  version_requirements: !ruby/object:Gem::Requirement
74
77
  requirements:
75
78
  - - "~>"
76
79
  - !ruby/object:Gem::Version
77
- version: '1.8'
78
- - - ">="
79
- - !ruby/object:Gem::Version
80
- version: 1.8.2
80
+ version: 0.4.1
81
81
  - !ruby/object:Gem::Dependency
82
- name: csv
82
+ name: redis
83
83
  requirement: !ruby/object:Gem::Requirement
84
84
  requirements:
85
85
  - - "~>"
86
86
  - !ruby/object:Gem::Version
87
- version: '3.2'
87
+ version: '5.0'
88
88
  - - ">="
89
89
  - !ruby/object:Gem::Version
90
- version: 3.2.8
90
+ version: 5.0.8
91
91
  type: :runtime
92
92
  prerelease: false
93
93
  version_requirements: !ruby/object:Gem::Requirement
94
94
  requirements:
95
95
  - - "~>"
96
96
  - !ruby/object:Gem::Version
97
- version: '3.2'
97
+ version: '5.0'
98
98
  - - ">="
99
99
  - !ruby/object:Gem::Version
100
- version: 3.2.8
100
+ version: 5.0.8
101
101
  - !ruby/object:Gem::Dependency
102
- name: redis
102
+ name: text
103
103
  requirement: !ruby/object:Gem::Requirement
104
104
  requirements:
105
105
  - - "~>"
106
106
  - !ruby/object:Gem::Version
107
- version: '5.0'
107
+ version: '1.3'
108
108
  - - ">="
109
109
  - !ruby/object:Gem::Version
110
- version: 5.0.8
110
+ version: 1.3.1
111
111
  type: :runtime
112
112
  prerelease: false
113
113
  version_requirements: !ruby/object:Gem::Requirement
114
114
  requirements:
115
115
  - - "~>"
116
116
  - !ruby/object:Gem::Version
117
- version: '5.0'
117
+ version: '1.3'
118
118
  - - ">="
119
119
  - !ruby/object:Gem::Version
120
- version: 5.0.8
120
+ version: 1.3.1
121
+ - !ruby/object:Gem::Dependency
122
+ name: minitest
123
+ requirement: !ruby/object:Gem::Requirement
124
+ requirements:
125
+ - - "~>"
126
+ - !ruby/object:Gem::Version
127
+ version: '5.20'
128
+ type: :development
129
+ prerelease: false
130
+ version_requirements: !ruby/object:Gem::Requirement
131
+ requirements:
132
+ - - "~>"
133
+ - !ruby/object:Gem::Version
134
+ version: '5.20'
135
+ - !ruby/object:Gem::Dependency
136
+ name: pry
137
+ requirement: !ruby/object:Gem::Requirement
138
+ requirements:
139
+ - - "~>"
140
+ - !ruby/object:Gem::Version
141
+ version: 0.14.2
142
+ type: :development
143
+ prerelease: false
144
+ version_requirements: !ruby/object:Gem::Requirement
145
+ requirements:
146
+ - - "~>"
147
+ - !ruby/object:Gem::Version
148
+ version: 0.14.2
121
149
  description: A gem to read GTFS stops data and create clusters based on coordinates
122
150
  and stop names' similarities.
123
151
  email:
@@ -126,11 +154,21 @@ executables: []
126
154
  extensions: []
127
155
  extra_rdoc_files: []
128
156
  files:
157
+ - ".rubocop.yml"
158
+ - CHANGELOG.md
159
+ - CODE_OF_CONDUCT.md
160
+ - LICENSE.txt
161
+ - README.md
162
+ - Rakefile
163
+ - gtfs_stops_clustering.gemspec
129
164
  - lib/gtfs_stops_clustering.rb
130
165
  - lib/gtfs_stops_clustering/data_import.rb
131
166
  - lib/gtfs_stops_clustering/dbscan.rb
167
+ - lib/gtfs_stops_clustering/input_consistency_checks.rb
132
168
  - lib/gtfs_stops_clustering/redis_geodata.rb
133
169
  - lib/gtfs_stops_clustering/version.rb
170
+ - lib/stops_corner_cases.txt
171
+ - sig/gtfs_stops_clustering.rbs
134
172
  homepage: https://github.com/Visco01/gtfs_stops_clustering
135
173
  licenses:
136
174
  - MIT