gtfs_stops_clustering 0.1.5 → 0.1.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bf75afd13663d9ebf254d37a24763336b5f411c174f3e8b9743af32c0f3843e9
4
- data.tar.gz: 16d82b24fbd8b2170917f87745b68874802f95a151ebe500bf51cba408edd7b1
3
+ metadata.gz: 3e2e3f607f0439e34962d10515cab56ba2a456113300cbb5819e87d75f6ce039
4
+ data.tar.gz: 488c0844cf9b633feab97c702c8ba33004bc72bdf0fb3d09246e7727bb5c7813
5
5
  SHA512:
6
- metadata.gz: 5efc4d7c36b869092e9147f5b350af0178cf968d08e9df1fe94705e7277f19b6d3bccdd91e94ba4c23735a827125263a2ac7dec5dee4768da6314fe49326ea6b
7
- data.tar.gz: 01e7f5fa34685099706249c1f40e64f40986fce2fc64a78f8696d6be967a1bbafdc2c5706830d7b5b423d984df3b232e3fe96acc02507a10d5fbb2f23eaa6792
6
+ metadata.gz: fd6543aa275867c1c711ebb5002b326d3d5b72aab36724d60ec75243e20407ec3a1258ec24895a1087cf23de1b749ec88b6e132938991cc2c03b447f584b3522
7
+ data.tar.gz: 5acd76e6925e49f2da4bb8bd46d3d7255fa27f9a0d13ee650d9f303bb88b2b9fd14b6d07416f0ff5c62890608684512ae888c29d7d6ba25eea048f168eb57dd6
data/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
- ## [Unreleased]
1
+ ## 0.1.6
2
2
 
3
- ## [0.1.0] - 2023-12-06
3
+ ## [0.1.6] - 2023-12-19
4
+
5
+ - Clean Redis stops data after performing the clustering algorithm
6
+
7
+ ## [0.1.5] - 2023-12-10
4
8
 
5
9
  - Initial release
data/README.md CHANGED
@@ -1,24 +1,89 @@
1
- # GtfsStopsClustering
1
+ # GTFS Stops clustering
2
+ [![Gem Version](https://badge.fury.io/rb/gtfs_stops_clustering.svg)](https://badge.fury.io/rb/gtfs_stops_clustering)
2
3
 
3
- TODO: Delete this and the text below, and describe your gem
4
+ GTFS Stops Clustering is a Ruby Gem designed to read [GTFS](https://gtfs.org) (General Transit Feed Specification) stops data and create clusters based on the following parameters:
4
5
 
5
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/gtfs_stops_clustering`. To experiment with that code, run `bin/console` for an interactive prompt.
6
+ - `GTFS paths` [Required]: array of gtfs zip files paths whose stops will be combined in the clustering algorithm
7
+ - `Epsilon` [Required]: the maximum distance (in km) between 2 stops for them to be considered neighbors of one another (e.g.: 0.01, 0.5, 2 etc.)
8
+ - `Min Points` [Required]: the minimum number of neighbors a point needs to have to be considered a core point (e.g.: 3, 5, 10 etc.)
9
+ - `Names Similarity` [Optional]: Besides geographical proximity, the algorithm also considers the similarity between stop names using techniques like string similarity measures. This enhances the clustering by including stops with similar names within the same cluster (e.g.: all values between 0 and 1. The more the value is in proximity of 1, the more similar the stop names need to be considered points of the same cluster). The default value is 1, so if you want to create clusters based only on stop positions, leave this to 0.
10
+ - `Stop config file` (CSV file path) [Optional]: This file is specifically designed to handle certain cases where stop names need to be altered or mapped to different names before running the clustering algorithm. Each entry consists of two columns:
11
+ **stop_name**: This column contains the original name of the stop that requires modification or mapping to another name. **cluster_name**: This column specifies the name to which the original stop name should be changed or mapped during the clustering process.
12
+
13
+ It utilizes the [DBSCAN](https://en.wikipedia.org/wiki/DBSCAN) Density-Based algorithm to perform clustering. I based my core algorithm on the gem [Dbscan](https://github.com/matiasinsaurralde/dbscan)
14
+
15
+ ### Stops config file example
16
+
17
+ Here is an example of a stops_config CSV file:
18
+
19
+ ```csv
20
+ stop_name,cluster_name
21
+ Stop Name To Be Changed,Actual Name
22
+ Amargosa Valley (Demo),Amargosa Valley
23
+ E Main St / S Irving St (Demo),E Main St / S Irving St
24
+ ```
25
+
26
+ In this case, passing this CSV file to the clustering algorithm, **Amargosa Valley (Demo)** will be renamed **Amargarosa Valley**, and so on for all the entries provided. The reason why I needed to implement this feature is simply because I was dealing with bad stops names (typo) provided by default within the GTFS I was working on.
27
+
28
+ ## Requirements
29
+
30
+ It is essential to have a **Redis server instance running locally (on default port 6379)** because the algorithm leverages Redis geospatial queries for efficient spatial operations.
31
+ The Redis server is utilized to optimize geospatial queries, allowing the clustering algorithm to efficiently process proximity-related computations required during the clustering process.
32
+ Please ensure that a Redis server is installed and running on your local machine to utilize the gem effectively.
6
33
 
7
34
  ## Installation
8
35
 
9
- TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
36
+ Add this line to your application's Gemfile:
10
37
 
11
- Install the gem and add to the application's Gemfile by executing:
38
+ ```ruby
39
+ gem 'gtfs_stops_clustering', '~> 0.1.5'
40
+ ```
41
+ And run the following command
12
42
 
13
- $ bundle add UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG
43
+ ```bash
44
+ $ bundle install
45
+ ```
14
46
 
15
47
  If bundler is not being used to manage dependencies, install the gem by executing:
16
48
 
17
- $ gem install UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG
49
+ ```bash
50
+ $ gem install gtfs_stops_clustering
51
+ ```
18
52
 
19
53
  ## Usage
20
54
 
21
- TODO: Write usage instructions here
55
+ ```ruby
56
+ require 'gtfs_stops_clustering'
57
+ include GtfsStopsClustering
58
+
59
+ gtfs_paths = ["path/to/gtfs/zip"]
60
+
61
+ clusters = build_clusters(gtfs_paths, 0.3, 1, 0.85)
62
+
63
+ clusters.each do |index, cluster|
64
+ puts index
65
+ cluster.each do |stop|
66
+ puts stop.inspect
67
+ end
68
+ end
69
+ ```
70
+
71
+ In this case, I'm showing the output referred to the GTFS file located in `test/fixtures/sample-feed-2.zip` (which is the sample-feed provided by Google, but changed a bit in order to create "clusterable" stops since they all were too far to be clustered). In this case I omitted the optional parameter `stops config`
72
+
73
+ ```
74
+ -1
75
+ {:stop_id=>"4", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"Stagecoach Hotel & Casino (Demo)", :stop_lat=>"36.915682", :stop_lon=>"-116.751677", :parent_station=>nil}
76
+ {:stop_id=>"6", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"Alone stop (sad)", :stop_lat=>"36.914944", :stop_lon=>"-116.761472", :parent_station=>nil}
77
+ {:stop_id=>"8", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"E Main St / S Irving St (Demo)", :stop_lat=>"36.905697", :stop_lon=>"-116.76218", :parent_station=>nil}
78
+ {:stop_id=>"9", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"Amargosa Valley (Demo)", :stop_lat=>"36.641496", :stop_lon=>"-116.40094", :parent_station=>nil}
79
+ 0
80
+ {:stop_id=>"1", :stop_code=>nil, :cluster_name=>"Awesome Stop Name", :cluster_pos=>[36.425286, -117.133156], :stop_name=>"Awesome stop name 1", :stop_lat=>"36.425288", :stop_lon=>"-117.133162", :parent_station=>nil}
81
+ {:stop_id=>"5", :stop_code=>nil, :cluster_name=>"Awesome Stop Name", :cluster_pos=>[36.425286, -117.133156], :stop_name=>"Awesome stop name 2", :stop_lat=>"36.425284", :stop_lon=>"-117.133150", :parent_station=>nil}
82
+ 1
83
+ {:stop_id=>"2", :stop_code=>nil, :cluster_name=>"Nye County Airport", :cluster_pos=>[36.868429, -116.78467699999999], :stop_name=>"Nye County Airport A1", :stop_lat=>"36.868446", :stop_lon=>"-116.784582", :parent_station=>nil}
84
+ {:stop_id=>"3", :stop_code=>nil, :cluster_name=>"Nye County Airport", :cluster_pos=>[36.868429, -116.78467699999999], :stop_name=>"Nye County Airport A2", :stop_lat=>"36.868417", :stop_lon=>"-116.784352", :parent_station=>nil}
85
+ {:stop_id=>"7", :stop_code=>nil, :cluster_name=>"Nye County Airport", :cluster_pos=>[36.868429, -116.78467699999999], :stop_name=>"Nye County Airport A5", :stop_lat=>"36.868424", :stop_lon=>"-116.785097", :parent_station=>nil}
86
+ ```
22
87
 
23
88
  ## Development
24
89
 
@@ -22,6 +22,7 @@ module RedisGeodata
22
22
  @key = "stops"
23
23
  @epsilon = epsilon
24
24
  geoadd
25
+ ObjectSpace.define_finalizer(self, proc { @redis.del(@key) })
25
26
  end
26
27
 
27
28
  def geoadd
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module GtfsStopsClustering
4
- VERSION = "0.1.5"
4
+ VERSION = "0.1.7"
5
5
  end
@@ -61,6 +61,7 @@ module GtfsStopsClustering
61
61
  @clusters.each_value do |cluster|
62
62
  cluster.each do |stop|
63
63
  gtfs_stop = @gtfs_stops.find { |e| e.lat == stop[:stop_lat] && e.lon == stop[:stop_lon] }
64
+ stop[:stop_name] = gtfs_stop.name
64
65
  stop[:stop_id] = gtfs_stop.id
65
66
  stop[:stop_code] = gtfs_stop.code
66
67
  stop[:parent_station] = gtfs_stop.parent_station
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gtfs_stops_clustering
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
4
+ version: 0.1.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pietro Visconti
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2023-12-09 00:00:00.000000000 Z
11
+ date: 2023-12-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: csv