gtfs_df 0.7.0 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +34 -0
- data/README.md +43 -13
- data/examples/split-by-agency/Gemfile.lock +9 -7
- data/lib/gtfs_df/base_gtfs_table.rb +4 -0
- data/lib/gtfs_df/feed.rb +44 -5
- data/lib/gtfs_df/graph.rb +18 -8
- data/lib/gtfs_df/reader.rb +1 -2
- data/lib/gtfs_df/version.rb +1 -1
- metadata +12 -6
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: b04b659c98fd64e3a9ccbe37390dbd1904b37efba1e001ddb3bfab6ada41e912
|
|
4
|
+
data.tar.gz: 5c003f2cdc866a1232c8988604040dbf0523c9605a96d5bea2138253011cea92
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7d7b6ef1ed68ffd05273a01c6e17a900a757b2c933c8d0c199378e1b94d79b1d7650d8094e22e2f9e2f56555bb7082418f8b2765675a48a4a0ec2d5ef7f0f6d9
|
|
7
|
+
data.tar.gz: 9d1c54c97ca7f1480693a736b44dcb656f664291e46f04af94bdd25027ea9d976772ce8ba2d30c28f470ecb4f570991784157e7492db20c8320f5cb7a2db480f
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,36 @@
|
|
|
1
|
+
## [0.9.0] - 2026-02-17
|
|
2
|
+
|
|
3
|
+
### 🚀 Features
|
|
4
|
+
|
|
5
|
+
- Add helper utilities
|
|
6
|
+
|
|
7
|
+
### 🐛 Bug Fixes
|
|
8
|
+
|
|
9
|
+
- [**breaking**] Bump rubyzip min version to 3.0
|
|
10
|
+
## [0.8.0] - 2026-01-09
|
|
11
|
+
|
|
12
|
+
### 🐛 Bug Fixes
|
|
13
|
+
|
|
14
|
+
- Ignore extra newlines when parsing csv
|
|
15
|
+
- Bump minimum ruby version to 3.2.0
|
|
16
|
+
- Fix fare_attributes filtering
|
|
17
|
+
- Fix exceptions edge case
|
|
18
|
+
- Replace dynamic graph traversal with bidirectional graph option
|
|
19
|
+
|
|
20
|
+
### 📚 Documentation
|
|
21
|
+
|
|
22
|
+
- Document dev environment
|
|
23
|
+
- Clarify the actions made by the bump-version script
|
|
24
|
+
- Update example transitive dependencies
|
|
25
|
+
|
|
26
|
+
### ⚙️ Miscellaneous Tasks
|
|
27
|
+
|
|
28
|
+
- Reduce the test run frequency
|
|
29
|
+
- Update dependabot schedule
|
|
30
|
+
- Consolidate test fixtures
|
|
31
|
+
- Add test for additional fares case
|
|
32
|
+
- Update readme
|
|
33
|
+
- Bump version to 0.8.0
|
|
1
34
|
## [0.7.0] - 2025-12-30
|
|
2
35
|
|
|
3
36
|
### 🚀 Features
|
|
@@ -22,6 +55,7 @@
|
|
|
22
55
|
### ⚙️ Miscellaneous Tasks
|
|
23
56
|
|
|
24
57
|
- Include the util helpers in the console and the test spec
|
|
58
|
+
- Bump version to 0.7.0
|
|
25
59
|
## [0.6.2] - 2025-12-15
|
|
26
60
|
|
|
27
61
|
### 🐛 Bug Fixes
|
data/README.md
CHANGED
|
@@ -35,6 +35,9 @@ feed = GtfsDf::Reader.load_from_zip('path/to/gtfs.zip')
|
|
|
35
35
|
# Or, load from a directory
|
|
36
36
|
feed = GtfsDf::Reader.load_from_dir('path/to/gtfs_dir')
|
|
37
37
|
|
|
38
|
+
# Parse times as seconds since midnight instead of string
|
|
39
|
+
feed = GtfsDf::Reader.load_from_dir('path/to/gtfs_dir', parse_times: true)
|
|
40
|
+
|
|
38
41
|
# Access dataframes for each GTFS file
|
|
39
42
|
puts feed.agency.head
|
|
40
43
|
puts feed.routes.head
|
|
@@ -71,11 +74,25 @@ When you filter by a field, the library automatically:
|
|
|
71
74
|
|
|
72
75
|
For example, filtering by `agency_id` will automatically filter routes, trips, stop_times, and stops to only include data for that agency.
|
|
73
76
|
|
|
77
|
+
By default gtfs_df treats trips as the atomic unit of GTFS. Therefore, if we
|
|
78
|
+
filter to one stop referenced by TripA, we will preserve _all stops_ referenced
|
|
79
|
+
by TripA.
|
|
80
|
+
|
|
81
|
+
To avoid this behavior, you can pass the `filter_only_children` param. In this case, only the children of the specified filter will be pruned and trip integrity will not be maintained. In the below example, stop 1 and related stop_times will be pruned.
|
|
82
|
+
|
|
83
|
+
```ruby
|
|
84
|
+
filtered_feed = feed.filter({ 'stop' => { 'stop_id' => ['1'] } }, filter_only_children: true)
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
|
|
74
88
|
### Writing filtered feeds
|
|
75
89
|
|
|
76
90
|
```ruby
|
|
77
91
|
# Write to a new zip file
|
|
78
92
|
GtfsDf::Writer.write_to_zip(filtered_feed, 'output/filtered_gtfs.zip')
|
|
93
|
+
|
|
94
|
+
# Write to a directory
|
|
95
|
+
GtfsDf::Writer.write_to_dir(filtered_feed, 'output/filtered_gtfs')
|
|
79
96
|
```
|
|
80
97
|
|
|
81
98
|
### Example: Split feed by agency
|
|
@@ -84,30 +101,41 @@ See [examples/split-by-agency](examples/split-by-agency) for a complete example
|
|
|
84
101
|
|
|
85
102
|
## Development
|
|
86
103
|
|
|
87
|
-
|
|
104
|
+
### Environment
|
|
105
|
+
|
|
106
|
+
This project manages its development environment with nix, specifically [devenv].
|
|
107
|
+
|
|
108
|
+
After checking out the repo:
|
|
88
109
|
|
|
89
|
-
|
|
110
|
+
- Install devenv: https://devenv.sh/getting-started/
|
|
111
|
+
|
|
112
|
+
- To enable the environment you can either:
|
|
113
|
+
- Use [direnv] to enable the environment as soon as you enter the project's path.
|
|
114
|
+
- Enable it only when you needed by running: `devenv shell`
|
|
115
|
+
|
|
116
|
+
- Run `bin/setup` to install the gem dependencies.
|
|
117
|
+
|
|
118
|
+
### Tests
|
|
119
|
+
|
|
120
|
+
Run `rake spec` to run the tests.
|
|
121
|
+
|
|
122
|
+
### REPL
|
|
123
|
+
|
|
124
|
+
You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
|
90
125
|
|
|
91
126
|
## Release process
|
|
92
127
|
|
|
93
128
|
1. `bin/bump-version`
|
|
94
129
|
|
|
95
|
-
-
|
|
96
|
-
-
|
|
97
|
-
-
|
|
98
|
-
-
|
|
130
|
+
- Bumps the version in `lib/gtfs_df/version.rb`
|
|
131
|
+
- Updates the `CHANGELOG.md` using the git log since the last version
|
|
132
|
+
- Creates and push a new release branch with those changes
|
|
133
|
+
- Creates a PR for that release
|
|
99
134
|
|
|
100
135
|
2. `bin/create-tag`
|
|
101
136
|
|
|
102
137
|
Creates and pushes the git tag for the release. That will trigger the GitHub action: `.github/workflows/publish.yml` to publish to RubyGems.
|
|
103
138
|
|
|
104
|
-
## TODO
|
|
105
|
-
|
|
106
|
-
- [ ] Time parsing
|
|
107
|
-
|
|
108
|
-
Just like partridge, we should parse Time as seconds since midnight. There's a draft in `lib/gtfs_df/utils.rb` but it's not used anywhere.
|
|
109
|
-
I haven't figured out how to properly implement that with Polars.
|
|
110
|
-
|
|
111
139
|
## Contributing
|
|
112
140
|
|
|
113
141
|
Bug reports and pull requests are welcome on GitHub at https://github.com/davidmh/ruby-gtfs_df.
|
|
@@ -120,3 +148,5 @@ The gem is available as open source under the terms of the [MIT License](https:/
|
|
|
120
148
|
[Polars]: https://pola.rs/
|
|
121
149
|
[ruby-polars]: https://github.com/ankane/ruby-polars
|
|
122
150
|
[partridge]: https://github.com/remix/partridge
|
|
151
|
+
[devenv]: https://devenv.sh
|
|
152
|
+
[direnv]: https://direnv.net
|
|
@@ -1,20 +1,21 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: ../..
|
|
3
3
|
specs:
|
|
4
|
-
gtfs_df (0.
|
|
4
|
+
gtfs_df (0.9.0)
|
|
5
5
|
networkx (~> 0.4)
|
|
6
6
|
polars-df (~> 0.22)
|
|
7
|
-
rubyzip (
|
|
7
|
+
rubyzip (>= 3.0, < 4.0)
|
|
8
8
|
|
|
9
9
|
GEM
|
|
10
10
|
remote: https://gem.coop/
|
|
11
11
|
specs:
|
|
12
|
-
bigdecimal (
|
|
12
|
+
bigdecimal (4.0.1)
|
|
13
|
+
json (2.18.0)
|
|
13
14
|
matrix (0.4.3)
|
|
14
15
|
networkx (0.4.0)
|
|
15
16
|
matrix (~> 0.4)
|
|
16
17
|
rb_heap (~> 1.0)
|
|
17
|
-
optparse (0.8.
|
|
18
|
+
optparse (0.8.1)
|
|
18
19
|
polars-df (0.23.0-aarch64-linux)
|
|
19
20
|
bigdecimal
|
|
20
21
|
polars-df (0.23.0-aarch64-linux-musl)
|
|
@@ -28,11 +29,12 @@ GEM
|
|
|
28
29
|
polars-df (0.23.0-x86_64-linux-musl)
|
|
29
30
|
bigdecimal
|
|
30
31
|
rb_heap (1.1.0)
|
|
31
|
-
rubyzip (2.
|
|
32
|
+
rubyzip (3.2.2)
|
|
32
33
|
unicode-display_width (3.2.0)
|
|
33
34
|
unicode-emoji (~> 4.1)
|
|
34
|
-
unicode-emoji (4.
|
|
35
|
-
whirly (0.
|
|
35
|
+
unicode-emoji (4.2.0)
|
|
36
|
+
whirly (0.4.0)
|
|
37
|
+
json
|
|
36
38
|
unicode-display_width (>= 1.1)
|
|
37
39
|
|
|
38
40
|
PLATFORMS
|
|
@@ -15,6 +15,10 @@ module GtfsDf
|
|
|
15
15
|
df = Polars.read_csv(input, infer_schema_length: 0, encoding: "utf8-lossy")
|
|
16
16
|
.rename(->(col) { col.strip })
|
|
17
17
|
|
|
18
|
+
# Strip out empty lines. Unfortunately read_csv does not support the drop_empty_rows
|
|
19
|
+
# option right now.
|
|
20
|
+
df = df.filter(Polars.all_horizontal(Polars.all.is_null).is_not)
|
|
21
|
+
|
|
18
22
|
dtypes = self.class::SCHEMA.slice(*df.columns)
|
|
19
23
|
df
|
|
20
24
|
.with_columns(dtypes.keys.map do |col|
|
data/lib/gtfs_df/feed.rb
CHANGED
|
@@ -148,6 +148,32 @@ module GtfsDf
|
|
|
148
148
|
self.class.new(filtered, parse_times: @parse_times)
|
|
149
149
|
end
|
|
150
150
|
|
|
151
|
+
# Utility method that returns a hash of dataframes by file name
|
|
152
|
+
#
|
|
153
|
+
# @return [{file_name => dataframe}]
|
|
154
|
+
def by_dataframe_name
|
|
155
|
+
GTFS_FILES.filter_map do |file|
|
|
156
|
+
dataframe = send(file)
|
|
157
|
+
dataframe ? [file, dataframe] : nil
|
|
158
|
+
end.to_h
|
|
159
|
+
end
|
|
160
|
+
|
|
161
|
+
# Utility method for getting a dataframe, e.g. feed['agency']
|
|
162
|
+
#
|
|
163
|
+
# @param [string] file name
|
|
164
|
+
# @return [dataframe]
|
|
165
|
+
def [](file_name)
|
|
166
|
+
send(file_name)
|
|
167
|
+
end
|
|
168
|
+
|
|
169
|
+
# Utility method for setting a dataframe, e.g. feed['agency'] = new_dataframe
|
|
170
|
+
#
|
|
171
|
+
# @param [string] file name
|
|
172
|
+
# @value [dataframe] the new dataframe
|
|
173
|
+
def []=(file_name, value)
|
|
174
|
+
send("#{file_name}=", value)
|
|
175
|
+
end
|
|
176
|
+
|
|
151
177
|
private
|
|
152
178
|
|
|
153
179
|
def filter!(file, filters, filtered, filter_only_children: false)
|
|
@@ -172,17 +198,17 @@ module GtfsDf
|
|
|
172
198
|
filtered
|
|
173
199
|
end
|
|
174
200
|
|
|
175
|
-
# Traverses the
|
|
201
|
+
# Traverses the graph to prune unreferenced entities from child dataframes
|
|
176
202
|
# based on parent relationships. See GtfsDf::Graph::STOP_NODES
|
|
177
203
|
def prune!(root, filtered, filter_only_children: false)
|
|
178
204
|
seen_edges = Set.new
|
|
179
|
-
|
|
205
|
+
rerooted_graph = Graph.build(bidirectional: !filter_only_children)
|
|
180
206
|
|
|
181
207
|
queue = [root]
|
|
182
208
|
|
|
183
209
|
while queue.length > 0
|
|
184
210
|
parent_node_id = queue.shift
|
|
185
|
-
|
|
211
|
+
rerooted_graph.adj[parent_node_id].each do |child_node_id, attrs|
|
|
186
212
|
edge = edge_id(parent_node_id, child_node_id)
|
|
187
213
|
|
|
188
214
|
next if seen_edges.include?(edge)
|
|
@@ -209,6 +235,13 @@ module GtfsDf
|
|
|
209
235
|
|
|
210
236
|
queue << child_node_id
|
|
211
237
|
|
|
238
|
+
# If the edge is weak (e.g. reverse edge of an optional relationship),
|
|
239
|
+
# we traverse to ensure connectivity but do NOT apply the filter.
|
|
240
|
+
if attrs[:type] == :weak
|
|
241
|
+
# puts "Skipping weak filter: #{edge}"
|
|
242
|
+
next
|
|
243
|
+
end
|
|
244
|
+
|
|
212
245
|
attrs[:dependencies].each do |dep|
|
|
213
246
|
parent_col = dep[parent_node_id]
|
|
214
247
|
child_col = dep[child_node_id]
|
|
@@ -220,6 +253,13 @@ module GtfsDf
|
|
|
220
253
|
# Get valid values from parent
|
|
221
254
|
valid_values = parent_df[parent_col].to_a.uniq.compact
|
|
222
255
|
|
|
256
|
+
# Annoying special case to make sure that if we have a calendar with exceptions,
|
|
257
|
+
# the calendar_dates file doesn't end up pruning other files
|
|
258
|
+
if parent_node_id == "calendar_dates" && parent_col == "service_id" &&
|
|
259
|
+
filtered["calendar"]
|
|
260
|
+
valid_values = (valid_values + calendar["service_id"].to_a).uniq
|
|
261
|
+
end
|
|
262
|
+
|
|
223
263
|
# Filter child to only include rows that reference valid parent values
|
|
224
264
|
before = child_df.height
|
|
225
265
|
filter = Polars.col(child_col).is_in(valid_values)
|
|
@@ -243,8 +283,7 @@ module GtfsDf
|
|
|
243
283
|
end
|
|
244
284
|
|
|
245
285
|
def edge_id(parent, child)
|
|
246
|
-
|
|
247
|
-
[parent, child].sort.join("-")
|
|
286
|
+
[parent, child].join("-")
|
|
248
287
|
end
|
|
249
288
|
end
|
|
250
289
|
end
|
data/lib/gtfs_df/graph.rb
CHANGED
|
@@ -41,7 +41,7 @@ module GtfsDf
|
|
|
41
41
|
NODES = STANDARD_FILE_NODES.merge(STOP_NODES).freeze
|
|
42
42
|
|
|
43
43
|
# Returns a directed graph of GTFS file dependencies
|
|
44
|
-
def self.build
|
|
44
|
+
def self.build(bidirectional: false)
|
|
45
45
|
g = NetworkX::DiGraph.new
|
|
46
46
|
NODES.keys.each { |node| g.add_node(node) }
|
|
47
47
|
|
|
@@ -53,15 +53,15 @@ module GtfsDf
|
|
|
53
53
|
]}],
|
|
54
54
|
["agency", "fare_attributes", {dependencies: [
|
|
55
55
|
{"fare_attributes" => "agency_id",
|
|
56
|
-
"agency" => "agency_id"}
|
|
57
|
-
]}],
|
|
56
|
+
"agency" => "agency_id", :allow_null => true}
|
|
57
|
+
], optional: true}],
|
|
58
58
|
["fare_attributes", "fare_rules", {dependencies: [
|
|
59
59
|
{"fare_attributes" => "fare_id",
|
|
60
60
|
"fare_rules" => "fare_id"}
|
|
61
61
|
]}],
|
|
62
62
|
["routes", "fare_rules", {dependencies: [
|
|
63
63
|
{"fare_rules" => "route_id", "routes" => "route_id", :allow_null => true}
|
|
64
|
-
]}],
|
|
64
|
+
], optional: true}],
|
|
65
65
|
["routes", "trips", {dependencies: [
|
|
66
66
|
{"routes" => "route_id", "trips" => "route_id"}
|
|
67
67
|
]}],
|
|
@@ -73,12 +73,12 @@ module GtfsDf
|
|
|
73
73
|
]}],
|
|
74
74
|
# Self-referential edge: stops can reference parent stations (location_type=1)
|
|
75
75
|
["parent_stations", "stops", {dependencies: [
|
|
76
|
-
{"stops" => "parent_station", "parent_stations" => "stop_id"}
|
|
76
|
+
{"stops" => "parent_station", "parent_stations" => "stop_id", :allow_null => true}
|
|
77
77
|
]}],
|
|
78
78
|
["stops", "transfers", {dependencies: [
|
|
79
79
|
{"stops" => "stop_id", "transfers" => "from_stop_id"},
|
|
80
80
|
{"stops" => "stop_id", "transfers" => "to_stop_id"}
|
|
81
|
-
]}],
|
|
81
|
+
], optional: true}],
|
|
82
82
|
["calendar", "trips", {dependencies: [
|
|
83
83
|
{"trips" => "service_id", "calendar" => "service_id"}
|
|
84
84
|
]}],
|
|
@@ -86,11 +86,11 @@ module GtfsDf
|
|
|
86
86
|
{"trips" => "service_id", "calendar_dates" => "service_id"}
|
|
87
87
|
]}],
|
|
88
88
|
["shapes", "trips", {dependencies: [
|
|
89
|
-
{"trips" => "shape_id", "shapes" => "shape_id"}
|
|
89
|
+
{"trips" => "shape_id", "shapes" => "shape_id", :allow_null => true}
|
|
90
90
|
]}],
|
|
91
91
|
["trips", "frequencies", {dependencies: [
|
|
92
92
|
{"trips" => "trip_id", "frequencies" => "trip_id"}
|
|
93
|
-
]}],
|
|
93
|
+
], optional: true}],
|
|
94
94
|
|
|
95
95
|
# --- GTFS Extensions ---
|
|
96
96
|
["stops", "fare_leg_join_rules",
|
|
@@ -163,6 +163,16 @@ module GtfsDf
|
|
|
163
163
|
|
|
164
164
|
edges.each do |from, to, attrs|
|
|
165
165
|
g.add_edge(from, to, **attrs)
|
|
166
|
+
if bidirectional
|
|
167
|
+
# When adding the reverse edge, if the relationship is optional (child is not required),
|
|
168
|
+
# mark the reverse edge as weak. This prevents empty child tables (e.g. fare_rules)
|
|
169
|
+
# from filtering parent tables (e.g. routes) into emptiness.
|
|
170
|
+
reverse_attrs = attrs.dup
|
|
171
|
+
if attrs[:optional]
|
|
172
|
+
reverse_attrs[:type] = :weak
|
|
173
|
+
end
|
|
174
|
+
g.add_edge(to, from, **reverse_attrs)
|
|
175
|
+
end
|
|
166
176
|
end
|
|
167
177
|
g
|
|
168
178
|
end
|
data/lib/gtfs_df/reader.rb
CHANGED
data/lib/gtfs_df/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: gtfs_df
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.9.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- David Mejorado
|
|
@@ -41,16 +41,22 @@ dependencies:
|
|
|
41
41
|
name: rubyzip
|
|
42
42
|
requirement: !ruby/object:Gem::Requirement
|
|
43
43
|
requirements:
|
|
44
|
-
- - "
|
|
44
|
+
- - ">="
|
|
45
|
+
- !ruby/object:Gem::Version
|
|
46
|
+
version: '3.0'
|
|
47
|
+
- - "<"
|
|
45
48
|
- !ruby/object:Gem::Version
|
|
46
|
-
version: '
|
|
49
|
+
version: '4.0'
|
|
47
50
|
type: :runtime
|
|
48
51
|
prerelease: false
|
|
49
52
|
version_requirements: !ruby/object:Gem::Requirement
|
|
50
53
|
requirements:
|
|
51
|
-
- - "
|
|
54
|
+
- - ">="
|
|
55
|
+
- !ruby/object:Gem::Version
|
|
56
|
+
version: '3.0'
|
|
57
|
+
- - "<"
|
|
52
58
|
- !ruby/object:Gem::Version
|
|
53
|
-
version: '
|
|
59
|
+
version: '4.0'
|
|
54
60
|
description: 'A Ruby gem to load, filter, and manipulate GTFS (General Transit Feed
|
|
55
61
|
Specification) feeds using DataFrames powered by Polars. Supports cascading filters
|
|
56
62
|
that maintain referential integrity across related tables. NOTE: This gem is not
|
|
@@ -137,7 +143,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
137
143
|
requirements:
|
|
138
144
|
- - ">="
|
|
139
145
|
- !ruby/object:Gem::Version
|
|
140
|
-
version: 3.
|
|
146
|
+
version: 3.2.0
|
|
141
147
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
142
148
|
requirements:
|
|
143
149
|
- - ">="
|