gtfs_stops_clustering 0.1.5 → 0.1.6
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +6 -2
- data/README.md +73 -8
- data/lib/gtfs_stops_clustering/redis_geodata.rb +1 -0
- data/lib/gtfs_stops_clustering/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5babddb5a5a80c3afcdd55dca8b93738633c3e7420392683240665c8c375d4c9
|
4
|
+
data.tar.gz: c056da474b4bc656e83f6ba76c14e7ea7868225b1075818447da74eb68bc52a6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2ad48d54a269da348ffde78fbddb40697b7ed268b66e2ae5f22cae4fd834edccea9e360eb65fecd7a2fb3c22c5f3dfca0e297e2e7adbd61eb75ac106b4b181ef
|
7
|
+
data.tar.gz: 6c553ba442c11e604fb7740be93c19ff4ea013e3b5fa67c58745c4c704246eace595e147d20792ba5c36b54e640821a3f83688530fdb101d2ed214b47c8f7bce
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -1,24 +1,89 @@
|
|
1
|
-
#
|
1
|
+
# GTFS Stops clustering
|
2
|
+
[![Gem Version](https://badge.fury.io/rb/gtfs_stops_clustering.svg)](https://badge.fury.io/rb/gtfs_stops_clustering)
|
2
3
|
|
3
|
-
|
4
|
+
GTFS Stops Clustering is a Ruby Gem designed to read [GTFS](https://gtfs.org) (General Transit Feed Specification) stops data and create clusters based on the following parameters:
|
4
5
|
|
5
|
-
|
6
|
+
- `GTFS paths` [Required]: array of gtfs zip files paths whose stops will be combined in the clustering algorithm
|
7
|
+
- `Epsilon` [Required]: the maximum distance (in km) between 2 stops for them to be considered neighbors of one another (e.g.: 0.01, 0.5, 2 etc.)
|
8
|
+
- `Min Points` [Required]: the minimum number of neighbors a point needs to have to be considered a core point (e.g.: 3, 5, 10 etc.)
|
9
|
+
- `Names Similarity` [Optional]: Besides geographical proximity, the algorithm also considers the similarity between stop names using techniques like string similarity measures. This enhances the clustering by including stops with similar names within the same cluster (e.g.: all values between 0 and 1. The more the value is in proximity of 1, the more similar the stop names need to be considered points of the same cluster). The default value is 1, so if you want to create clusters based only on stop positions, leave this to 0.
|
10
|
+
- `Stop config file` (CSV file path) [Optional]: This file is specifically designed to handle certain cases where stop names need to be altered or mapped to different names before running the clustering algorithm. Each entry consists of two columns:
|
11
|
+
**stop_name**: This column contains the original name of the stop that requires modification or mapping to another name. **cluster_name**: This column specifies the name to which the original stop name should be changed or mapped during the clustering process.
|
12
|
+
|
13
|
+
It utilizes the [DBSCAN](https://en.wikipedia.org/wiki/DBSCAN) Density-Based algorithm to perform clustering. I based my core algorithm on the gem [Dbscan](https://github.com/matiasinsaurralde/dbscan)
|
14
|
+
|
15
|
+
### Stops config file example
|
16
|
+
|
17
|
+
Here is an example of a stops_config CSV file:
|
18
|
+
|
19
|
+
```csv
|
20
|
+
stop_name,cluster_name
|
21
|
+
Stop Name To Be Changed,Actual Name
|
22
|
+
Amargosa Valley (Demo),Amargosa Valley
|
23
|
+
E Main St / S Irving St (Demo),E Main St / S Irving St
|
24
|
+
```
|
25
|
+
|
26
|
+
In this case, passing this CSV file to the clustering algorithm, **Amargosa Valley (Demo)** will be renamed **Amargarosa Valley**, and so on for all the entries provided. The reason why I needed to implement this feature is simply because I was dealing with bad stops names (typo) provided by default within the GTFS I was working on.
|
27
|
+
|
28
|
+
## Requirements
|
29
|
+
|
30
|
+
It is essential to have a **Redis server instance running locally (on default port 6379)** because the algorithm leverages Redis geospatial queries for efficient spatial operations.
|
31
|
+
The Redis server is utilized to optimize geospatial queries, allowing the clustering algorithm to efficiently process proximity-related computations required during the clustering process.
|
32
|
+
Please ensure that a Redis server is installed and running on your local machine to utilize the gem effectively.
|
6
33
|
|
7
34
|
## Installation
|
8
35
|
|
9
|
-
|
36
|
+
Add this line to your application's Gemfile:
|
10
37
|
|
11
|
-
|
38
|
+
```ruby
|
39
|
+
gem 'gtfs_stops_clustering', '~> 0.1.5'
|
40
|
+
```
|
41
|
+
And run the following command
|
12
42
|
|
13
|
-
|
43
|
+
```bash
|
44
|
+
$ bundle install
|
45
|
+
```
|
14
46
|
|
15
47
|
If bundler is not being used to manage dependencies, install the gem by executing:
|
16
48
|
|
17
|
-
|
49
|
+
```bash
|
50
|
+
$ gem install gtfs_stops_clustering
|
51
|
+
```
|
18
52
|
|
19
53
|
## Usage
|
20
54
|
|
21
|
-
|
55
|
+
```ruby
|
56
|
+
require 'gtfs_stops_clustering'
|
57
|
+
include GtfsStopsClustering
|
58
|
+
|
59
|
+
gtfs_paths = ["path/to/gtfs/zip"]
|
60
|
+
|
61
|
+
clusters = build_clusters(gtfs_paths, 0.3, 1, 0.85)
|
62
|
+
|
63
|
+
clusters.each do |index, cluster|
|
64
|
+
puts index
|
65
|
+
cluster.each do |stop|
|
66
|
+
puts stop.inspect
|
67
|
+
end
|
68
|
+
end
|
69
|
+
```
|
70
|
+
|
71
|
+
In this case, I'm showing the output referred to the GTFS file located in `test/fixtures/sample-feed-2.zip` (which is the sample-feed provided by Google, but changed a bit in order to create "clusterable" stops since they all were too far to be clustered). In this case I omitted the optional parameter `stops config`
|
72
|
+
|
73
|
+
```
|
74
|
+
-1
|
75
|
+
{:stop_id=>"4", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"Stagecoach Hotel & Casino (Demo)", :stop_lat=>"36.915682", :stop_lon=>"-116.751677", :parent_station=>nil}
|
76
|
+
{:stop_id=>"6", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"Alone stop (sad)", :stop_lat=>"36.914944", :stop_lon=>"-116.761472", :parent_station=>nil}
|
77
|
+
{:stop_id=>"8", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"E Main St / S Irving St (Demo)", :stop_lat=>"36.905697", :stop_lon=>"-116.76218", :parent_station=>nil}
|
78
|
+
{:stop_id=>"9", :stop_code=>nil, :cluster_name=>nil, :cluster_pos=>[], :stop_name=>"Amargosa Valley (Demo)", :stop_lat=>"36.641496", :stop_lon=>"-116.40094", :parent_station=>nil}
|
79
|
+
0
|
80
|
+
{:stop_id=>"1", :stop_code=>nil, :cluster_name=>"Awesome Stop Name", :cluster_pos=>[36.425286, -117.133156], :stop_name=>"Awesome stop name 1", :stop_lat=>"36.425288", :stop_lon=>"-117.133162", :parent_station=>nil}
|
81
|
+
{:stop_id=>"5", :stop_code=>nil, :cluster_name=>"Awesome Stop Name", :cluster_pos=>[36.425286, -117.133156], :stop_name=>"Awesome stop name 2", :stop_lat=>"36.425284", :stop_lon=>"-117.133150", :parent_station=>nil}
|
82
|
+
1
|
83
|
+
{:stop_id=>"2", :stop_code=>nil, :cluster_name=>"Nye County Airport", :cluster_pos=>[36.868429, -116.78467699999999], :stop_name=>"Nye County Airport A1", :stop_lat=>"36.868446", :stop_lon=>"-116.784582", :parent_station=>nil}
|
84
|
+
{:stop_id=>"3", :stop_code=>nil, :cluster_name=>"Nye County Airport", :cluster_pos=>[36.868429, -116.78467699999999], :stop_name=>"Nye County Airport A2", :stop_lat=>"36.868417", :stop_lon=>"-116.784352", :parent_station=>nil}
|
85
|
+
{:stop_id=>"7", :stop_code=>nil, :cluster_name=>"Nye County Airport", :cluster_pos=>[36.868429, -116.78467699999999], :stop_name=>"Nye County Airport A5", :stop_lat=>"36.868424", :stop_lon=>"-116.785097", :parent_station=>nil}
|
86
|
+
```
|
22
87
|
|
23
88
|
## Development
|
24
89
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gtfs_stops_clustering
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.6
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pietro Visconti
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-12-
|
11
|
+
date: 2023-12-19 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: csv
|