gtfs_df 0.7.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0b218f60e33e005576eb278a74ba89e6164fddccfe99e9f8e5665d105d9fa471
4
- data.tar.gz: cb4023192418d5ba89f08b605b2ec8ac79073b28d29f6161f252d97df3291dff
3
+ metadata.gz: b04b659c98fd64e3a9ccbe37390dbd1904b37efba1e001ddb3bfab6ada41e912
4
+ data.tar.gz: 5c003f2cdc866a1232c8988604040dbf0523c9605a96d5bea2138253011cea92
5
5
  SHA512:
6
- metadata.gz: cefa279582b0579a637e4ebc39fe4c3d90d56582aa1cd0d00ffda041b92f681831334d46aeae95dc1c291c1295758c500a0c7bfe32876f226605f5b150fc0a9c
7
- data.tar.gz: e4b5bbe129674304121c61cb732a8261421158f206370289e67d3e6d514bbb089bc0c35f88fde6e46012a18afd3700b86eb4c2a2b457b9880270ef3a4f74cf79
6
+ metadata.gz: 7d7b6ef1ed68ffd05273a01c6e17a900a757b2c933c8d0c199378e1b94d79b1d7650d8094e22e2f9e2f56555bb7082418f8b2765675a48a4a0ec2d5ef7f0f6d9
7
+ data.tar.gz: 9d1c54c97ca7f1480693a736b44dcb656f664291e46f04af94bdd25027ea9d976772ce8ba2d30c28f470ecb4f570991784157e7492db20c8320f5cb7a2db480f
data/CHANGELOG.md CHANGED
@@ -1,3 +1,36 @@
1
+ ## [0.9.0] - 2026-02-17
2
+
3
+ ### 🚀 Features
4
+
5
+ - Add helper utilities
6
+
7
+ ### 🐛 Bug Fixes
8
+
9
+ - [**breaking**] Bump rubyzip min version to 3.0
10
+ ## [0.8.0] - 2026-01-09
11
+
12
+ ### 🐛 Bug Fixes
13
+
14
+ - Ignore extra newlines when parsing csv
15
+ - Bump minimum ruby version to 3.2.0
16
+ - Fix fare_attributes filtering
17
+ - Fix exceptions edge case
18
+ - Replace dynamic graph traversal with bidirectional graph option
19
+
20
+ ### 📚 Documentation
21
+
22
+ - Document dev environment
23
+ - Clarify the actions made by the bump-version script
24
+ - Update example transitive dependencies
25
+
26
+ ### ⚙️ Miscellaneous Tasks
27
+
28
+ - Reduce the test run frequency
29
+ - Update dependabot schedule
30
+ - Consolidate test fixtures
31
+ - Add test for additional fares case
32
+ - Update readme
33
+ - Bump version to 0.8.0
1
34
  ## [0.7.0] - 2025-12-30
2
35
 
3
36
  ### 🚀 Features
@@ -22,6 +55,7 @@
22
55
  ### ⚙️ Miscellaneous Tasks
23
56
 
24
57
  - Include the util helpers in the console and the test spec
58
+ - Bump version to 0.7.0
25
59
  ## [0.6.2] - 2025-12-15
26
60
 
27
61
  ### 🐛 Bug Fixes
data/README.md CHANGED
@@ -35,6 +35,9 @@ feed = GtfsDf::Reader.load_from_zip('path/to/gtfs.zip')
35
35
  # Or, load from a directory
36
36
  feed = GtfsDf::Reader.load_from_dir('path/to/gtfs_dir')
37
37
 
38
+ # Parse times as seconds since midnight instead of string
39
+ feed = GtfsDf::Reader.load_from_dir('path/to/gtfs_dir', parse_times: true)
40
+
38
41
  # Access dataframes for each GTFS file
39
42
  puts feed.agency.head
40
43
  puts feed.routes.head
@@ -71,11 +74,25 @@ When you filter by a field, the library automatically:
71
74
 
72
75
  For example, filtering by `agency_id` will automatically filter routes, trips, stop_times, and stops to only include data for that agency.
73
76
 
77
+ By default gtfs_df treats trips as the atomic unit of GTFS. Therefore, if we
78
+ filter to one stop referenced by TripA, we will preserve _all stops_ referenced
79
+ by TripA.
80
+
81
+ To avoid this behavior, you can pass the `filter_only_children` param. In this case, only the children of the specified filter will be pruned and trip integrity will not be maintained. In the below example, stop 1 and related stop_times will be pruned.
82
+
83
+ ```ruby
84
+ filtered_feed = feed.filter({ 'stop' => { 'stop_id' => ['1'] } }, filter_only_children: true)
85
+ ```
86
+
87
+
74
88
  ### Writing filtered feeds
75
89
 
76
90
  ```ruby
77
91
  # Write to a new zip file
78
92
  GtfsDf::Writer.write_to_zip(filtered_feed, 'output/filtered_gtfs.zip')
93
+
94
+ # Write to a directory
95
+ GtfsDf::Writer.write_to_dir(filtered_feed, 'output/filtered_gtfs')
79
96
  ```
80
97
 
81
98
  ### Example: Split feed by agency
@@ -84,30 +101,41 @@ See [examples/split-by-agency](examples/split-by-agency) for a complete example
84
101
 
85
102
  ## Development
86
103
 
87
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
104
+ ### Environment
105
+
106
+ This project manages its development environment with nix, specifically [devenv].
107
+
108
+ After checking out the repo:
88
109
 
89
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
110
+ - Install devenv: https://devenv.sh/getting-started/
111
+
112
+ - To enable the environment you can either:
113
+ - Use [direnv] to enable the environment as soon as you enter the project's path.
114
+ - Enable it only when you needed by running: `devenv shell`
115
+
116
+ - Run `bin/setup` to install the gem dependencies.
117
+
118
+ ### Tests
119
+
120
+ Run `rake spec` to run the tests.
121
+
122
+ ### REPL
123
+
124
+ You can also run `bin/console` for an interactive prompt that will allow you to experiment.
90
125
 
91
126
  ## Release process
92
127
 
93
128
  1. `bin/bump-version`
94
129
 
95
- - Bump the version in `lib/gtfs_df/version.rb`
96
- - Update the `CHANGELOG.md` using the git log since the last version
97
- - Create and push a new release branch with those changes
98
- - Create a PR for that release
130
+ - Bumps the version in `lib/gtfs_df/version.rb`
131
+ - Updates the `CHANGELOG.md` using the git log since the last version
132
+ - Creates and push a new release branch with those changes
133
+ - Creates a PR for that release
99
134
 
100
135
  2. `bin/create-tag`
101
136
 
102
137
  Creates and pushes the git tag for the release. That will trigger the GitHub action: `.github/workflows/publish.yml` to publish to RubyGems.
103
138
 
104
- ## TODO
105
-
106
- - [ ] Time parsing
107
-
108
- Just like partridge, we should parse Time as seconds since midnight. There's a draft in `lib/gtfs_df/utils.rb` but it's not used anywhere.
109
- I haven't figured out how to properly implement that with Polars.
110
-
111
139
  ## Contributing
112
140
 
113
141
  Bug reports and pull requests are welcome on GitHub at https://github.com/davidmh/ruby-gtfs_df.
@@ -120,3 +148,5 @@ The gem is available as open source under the terms of the [MIT License](https:/
120
148
  [Polars]: https://pola.rs/
121
149
  [ruby-polars]: https://github.com/ankane/ruby-polars
122
150
  [partridge]: https://github.com/remix/partridge
151
+ [devenv]: https://devenv.sh
152
+ [direnv]: https://direnv.net
@@ -1,20 +1,21 @@
1
1
  PATH
2
2
  remote: ../..
3
3
  specs:
4
- gtfs_df (0.7.0)
4
+ gtfs_df (0.9.0)
5
5
  networkx (~> 0.4)
6
6
  polars-df (~> 0.22)
7
- rubyzip (~> 2.3)
7
+ rubyzip (>= 3.0, < 4.0)
8
8
 
9
9
  GEM
10
10
  remote: https://gem.coop/
11
11
  specs:
12
- bigdecimal (3.3.1)
12
+ bigdecimal (4.0.1)
13
+ json (2.18.0)
13
14
  matrix (0.4.3)
14
15
  networkx (0.4.0)
15
16
  matrix (~> 0.4)
16
17
  rb_heap (~> 1.0)
17
- optparse (0.8.0)
18
+ optparse (0.8.1)
18
19
  polars-df (0.23.0-aarch64-linux)
19
20
  bigdecimal
20
21
  polars-df (0.23.0-aarch64-linux-musl)
@@ -28,11 +29,12 @@ GEM
28
29
  polars-df (0.23.0-x86_64-linux-musl)
29
30
  bigdecimal
30
31
  rb_heap (1.1.0)
31
- rubyzip (2.4.1)
32
+ rubyzip (3.2.2)
32
33
  unicode-display_width (3.2.0)
33
34
  unicode-emoji (~> 4.1)
34
- unicode-emoji (4.1.0)
35
- whirly (0.3.0)
35
+ unicode-emoji (4.2.0)
36
+ whirly (0.4.0)
37
+ json
36
38
  unicode-display_width (>= 1.1)
37
39
 
38
40
  PLATFORMS
@@ -15,6 +15,10 @@ module GtfsDf
15
15
  df = Polars.read_csv(input, infer_schema_length: 0, encoding: "utf8-lossy")
16
16
  .rename(->(col) { col.strip })
17
17
 
18
+ # Strip out empty lines. Unfortunately read_csv does not support the drop_empty_rows
19
+ # option right now.
20
+ df = df.filter(Polars.all_horizontal(Polars.all.is_null).is_not)
21
+
18
22
  dtypes = self.class::SCHEMA.slice(*df.columns)
19
23
  df
20
24
  .with_columns(dtypes.keys.map do |col|
data/lib/gtfs_df/feed.rb CHANGED
@@ -148,6 +148,32 @@ module GtfsDf
148
148
  self.class.new(filtered, parse_times: @parse_times)
149
149
  end
150
150
 
151
+ # Utility method that returns a hash of dataframes by file name
152
+ #
153
+ # @return [{file_name => dataframe}]
154
+ def by_dataframe_name
155
+ GTFS_FILES.filter_map do |file|
156
+ dataframe = send(file)
157
+ dataframe ? [file, dataframe] : nil
158
+ end.to_h
159
+ end
160
+
161
+ # Utility method for getting a dataframe, e.g. feed['agency']
162
+ #
163
+ # @param [string] file name
164
+ # @return [dataframe]
165
+ def [](file_name)
166
+ send(file_name)
167
+ end
168
+
169
+ # Utility method for setting a dataframe, e.g. feed['agency'] = new_dataframe
170
+ #
171
+ # @param [string] file name
172
+ # @value [dataframe] the new dataframe
173
+ def []=(file_name, value)
174
+ send("#{file_name}=", value)
175
+ end
176
+
151
177
  private
152
178
 
153
179
  def filter!(file, filters, filtered, filter_only_children: false)
@@ -172,17 +198,17 @@ module GtfsDf
172
198
  filtered
173
199
  end
174
200
 
175
- # Traverses the grah to prune unreferenced entities from child dataframes
201
+ # Traverses the graph to prune unreferenced entities from child dataframes
176
202
  # based on parent relationships. See GtfsDf::Graph::STOP_NODES
177
203
  def prune!(root, filtered, filter_only_children: false)
178
204
  seen_edges = Set.new
179
- maybe_digraph = filter_only_children ? graph : graph.to_undirected
205
+ rerooted_graph = Graph.build(bidirectional: !filter_only_children)
180
206
 
181
207
  queue = [root]
182
208
 
183
209
  while queue.length > 0
184
210
  parent_node_id = queue.shift
185
- maybe_digraph.adj[parent_node_id].each do |child_node_id, attrs|
211
+ rerooted_graph.adj[parent_node_id].each do |child_node_id, attrs|
186
212
  edge = edge_id(parent_node_id, child_node_id)
187
213
 
188
214
  next if seen_edges.include?(edge)
@@ -209,6 +235,13 @@ module GtfsDf
209
235
 
210
236
  queue << child_node_id
211
237
 
238
+ # If the edge is weak (e.g. reverse edge of an optional relationship),
239
+ # we traverse to ensure connectivity but do NOT apply the filter.
240
+ if attrs[:type] == :weak
241
+ # puts "Skipping weak filter: #{edge}"
242
+ next
243
+ end
244
+
212
245
  attrs[:dependencies].each do |dep|
213
246
  parent_col = dep[parent_node_id]
214
247
  child_col = dep[child_node_id]
@@ -220,6 +253,13 @@ module GtfsDf
220
253
  # Get valid values from parent
221
254
  valid_values = parent_df[parent_col].to_a.uniq.compact
222
255
 
256
+ # Annoying special case to make sure that if we have a calendar with exceptions,
257
+ # the calendar_dates file doesn't end up pruning other files
258
+ if parent_node_id == "calendar_dates" && parent_col == "service_id" &&
259
+ filtered["calendar"]
260
+ valid_values = (valid_values + calendar["service_id"].to_a).uniq
261
+ end
262
+
223
263
  # Filter child to only include rows that reference valid parent values
224
264
  before = child_df.height
225
265
  filter = Polars.col(child_col).is_in(valid_values)
@@ -243,8 +283,7 @@ module GtfsDf
243
283
  end
244
284
 
245
285
  def edge_id(parent, child)
246
- # Alphabetize to make sure this works with undirected graph
247
- [parent, child].sort.join("-")
286
+ [parent, child].join("-")
248
287
  end
249
288
  end
250
289
  end
data/lib/gtfs_df/graph.rb CHANGED
@@ -41,7 +41,7 @@ module GtfsDf
41
41
  NODES = STANDARD_FILE_NODES.merge(STOP_NODES).freeze
42
42
 
43
43
  # Returns a directed graph of GTFS file dependencies
44
- def self.build
44
+ def self.build(bidirectional: false)
45
45
  g = NetworkX::DiGraph.new
46
46
  NODES.keys.each { |node| g.add_node(node) }
47
47
 
@@ -53,15 +53,15 @@ module GtfsDf
53
53
  ]}],
54
54
  ["agency", "fare_attributes", {dependencies: [
55
55
  {"fare_attributes" => "agency_id",
56
- "agency" => "agency_id"}
57
- ]}],
56
+ "agency" => "agency_id", :allow_null => true}
57
+ ], optional: true}],
58
58
  ["fare_attributes", "fare_rules", {dependencies: [
59
59
  {"fare_attributes" => "fare_id",
60
60
  "fare_rules" => "fare_id"}
61
61
  ]}],
62
62
  ["routes", "fare_rules", {dependencies: [
63
63
  {"fare_rules" => "route_id", "routes" => "route_id", :allow_null => true}
64
- ]}],
64
+ ], optional: true}],
65
65
  ["routes", "trips", {dependencies: [
66
66
  {"routes" => "route_id", "trips" => "route_id"}
67
67
  ]}],
@@ -73,12 +73,12 @@ module GtfsDf
73
73
  ]}],
74
74
  # Self-referential edge: stops can reference parent stations (location_type=1)
75
75
  ["parent_stations", "stops", {dependencies: [
76
- {"stops" => "parent_station", "parent_stations" => "stop_id"}
76
+ {"stops" => "parent_station", "parent_stations" => "stop_id", :allow_null => true}
77
77
  ]}],
78
78
  ["stops", "transfers", {dependencies: [
79
79
  {"stops" => "stop_id", "transfers" => "from_stop_id"},
80
80
  {"stops" => "stop_id", "transfers" => "to_stop_id"}
81
- ]}],
81
+ ], optional: true}],
82
82
  ["calendar", "trips", {dependencies: [
83
83
  {"trips" => "service_id", "calendar" => "service_id"}
84
84
  ]}],
@@ -86,11 +86,11 @@ module GtfsDf
86
86
  {"trips" => "service_id", "calendar_dates" => "service_id"}
87
87
  ]}],
88
88
  ["shapes", "trips", {dependencies: [
89
- {"trips" => "shape_id", "shapes" => "shape_id"}
89
+ {"trips" => "shape_id", "shapes" => "shape_id", :allow_null => true}
90
90
  ]}],
91
91
  ["trips", "frequencies", {dependencies: [
92
92
  {"trips" => "trip_id", "frequencies" => "trip_id"}
93
- ]}],
93
+ ], optional: true}],
94
94
 
95
95
  # --- GTFS Extensions ---
96
96
  ["stops", "fare_leg_join_rules",
@@ -163,6 +163,16 @@ module GtfsDf
163
163
 
164
164
  edges.each do |from, to, attrs|
165
165
  g.add_edge(from, to, **attrs)
166
+ if bidirectional
167
+ # When adding the reverse edge, if the relationship is optional (child is not required),
168
+ # mark the reverse edge as weak. This prevents empty child tables (e.g. fare_rules)
169
+ # from filtering parent tables (e.g. routes) into emptiness.
170
+ reverse_attrs = attrs.dup
171
+ if attrs[:optional]
172
+ reverse_attrs[:type] = :weak
173
+ end
174
+ g.add_edge(to, from, **reverse_attrs)
175
+ end
166
176
  end
167
177
  g
168
178
  end
@@ -14,8 +14,7 @@ module GtfsDf
14
14
  Zip::File.open(zip_path) do |zip_file|
15
15
  zip_file.each do |entry|
16
16
  next unless entry.file?
17
- out_path = File.join(tmpdir, entry.name)
18
- entry.extract(out_path)
17
+ entry.extract(destination_directory: tmpdir)
19
18
  end
20
19
  end
21
20
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module GtfsDf
4
- VERSION = "0.7.0"
4
+ VERSION = "0.9.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gtfs_df
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.9.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - David Mejorado
@@ -41,16 +41,22 @@ dependencies:
41
41
  name: rubyzip
42
42
  requirement: !ruby/object:Gem::Requirement
43
43
  requirements:
44
- - - "~>"
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '3.0'
47
+ - - "<"
45
48
  - !ruby/object:Gem::Version
46
- version: '2.3'
49
+ version: '4.0'
47
50
  type: :runtime
48
51
  prerelease: false
49
52
  version_requirements: !ruby/object:Gem::Requirement
50
53
  requirements:
51
- - - "~>"
54
+ - - ">="
55
+ - !ruby/object:Gem::Version
56
+ version: '3.0'
57
+ - - "<"
52
58
  - !ruby/object:Gem::Version
53
- version: '2.3'
59
+ version: '4.0'
54
60
  description: 'A Ruby gem to load, filter, and manipulate GTFS (General Transit Feed
55
61
  Specification) feeds using DataFrames powered by Polars. Supports cascading filters
56
62
  that maintain referential integrity across related tables. NOTE: This gem is not
@@ -137,7 +143,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
137
143
  requirements:
138
144
  - - ">="
139
145
  - !ruby/object:Gem::Version
140
- version: 3.1.0
146
+ version: 3.2.0
141
147
  required_rubygems_version: !ruby/object:Gem::Requirement
142
148
  requirements:
143
149
  - - ">="