gtfs_df 0.7.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0b218f60e33e005576eb278a74ba89e6164fddccfe99e9f8e5665d105d9fa471
4
- data.tar.gz: cb4023192418d5ba89f08b605b2ec8ac79073b28d29f6161f252d97df3291dff
3
+ metadata.gz: 27b47041852a25bf7c06f1553a1138f0f1e91fdbfa4619535374c749a561be97
4
+ data.tar.gz: e474c040a3fed6cefc3a36163d90133a3f44be3a91d0701854e36b7ba078fee5
5
5
  SHA512:
6
- metadata.gz: cefa279582b0579a637e4ebc39fe4c3d90d56582aa1cd0d00ffda041b92f681831334d46aeae95dc1c291c1295758c500a0c7bfe32876f226605f5b150fc0a9c
7
- data.tar.gz: e4b5bbe129674304121c61cb732a8261421158f206370289e67d3e6d514bbb089bc0c35f88fde6e46012a18afd3700b86eb4c2a2b457b9880270ef3a4f74cf79
6
+ metadata.gz: a6a306fd0e248619dbe518b3a573ca86c740bca0bcddeb86e014160aeabdd01b42345509db8b65b469d32bebd3767b1f34bd50ad99e0a602f7066a4859c346bc
7
+ data.tar.gz: dd02fc068a03da457beaa377441dd1fbe8f70c779bde47dfee7911b513743827a5fb0ac708ef793dec3bb4bab45110b54a690898cc2f6c19d6f9b77d85963f42
data/CHANGELOG.md CHANGED
@@ -1,3 +1,26 @@
1
+ ## [0.8.0] - 2026-01-09
2
+
3
+ ### 🐛 Bug Fixes
4
+
5
+ - Ignore extra newlines when parsing csv
6
+ - Bump minimum ruby version to 3.2.0
7
+ - Fix fare_attributes filtering
8
+ - Fix exceptions edge case
9
+ - Replace dynamic graph traversal with bidirectional graph option
10
+
11
+ ### 📚 Documentation
12
+
13
+ - Document dev environment
14
+ - Clarify the actions made by the bump-version script
15
+ - Update example transitive dependencies
16
+
17
+ ### ⚙️ Miscellaneous Tasks
18
+
19
+ - Reduce the test run frequency
20
+ - Update dependabot schedule
21
+ - Consolidate test fixtures
22
+ - Add test for additional fares case
23
+ - Update readme
1
24
  ## [0.7.0] - 2025-12-30
2
25
 
3
26
  ### 🚀 Features
@@ -22,6 +45,7 @@
22
45
  ### ⚙️ Miscellaneous Tasks
23
46
 
24
47
  - Include the util helpers in the console and the test spec
48
+ - Bump version to 0.7.0
25
49
  ## [0.6.2] - 2025-12-15
26
50
 
27
51
  ### 🐛 Bug Fixes
data/README.md CHANGED
@@ -35,6 +35,9 @@ feed = GtfsDf::Reader.load_from_zip('path/to/gtfs.zip')
35
35
  # Or, load from a directory
36
36
  feed = GtfsDf::Reader.load_from_dir('path/to/gtfs_dir')
37
37
 
38
+ # Parse times as seconds since midnight instead of string
39
+ feed = GtfsDf::Reader.load_from_dir('path/to/gtfs_dir', parse_times: true)
40
+
38
41
  # Access dataframes for each GTFS file
39
42
  puts feed.agency.head
40
43
  puts feed.routes.head
@@ -71,11 +74,25 @@ When you filter by a field, the library automatically:
71
74
 
72
75
  For example, filtering by `agency_id` will automatically filter routes, trips, stop_times, and stops to only include data for that agency.
73
76
 
77
+ By default gtfs_df treats trips as the atomic unit of GTFS. Therefore, if we
78
+ filter to one stop referenced by TripA, we will preserve _all stops_ referenced
79
+ by TripA.
80
+
81
+ To avoid this behavior, you can pass the `filter_only_children` param. In this case, only the children of the specified filter will be pruned and trip integrity will not be maintained. In the below example, stop 1 and related stop_times will be pruned.
82
+
83
+ ```ruby
84
+ filtered_feed = feed.filter({ 'stop' => { 'stop_id' => ['1'] } }, filter_only_children: true)
85
+ ```
86
+
87
+
74
88
  ### Writing filtered feeds
75
89
 
76
90
  ```ruby
77
91
  # Write to a new zip file
78
92
  GtfsDf::Writer.write_to_zip(filtered_feed, 'output/filtered_gtfs.zip')
93
+
94
+ # Write to a directory
95
+ GtfsDf::Writer.write_to_dir(filtered_feed, 'output/filtered_gtfs')
79
96
  ```
80
97
 
81
98
  ### Example: Split feed by agency
@@ -84,30 +101,41 @@ See [examples/split-by-agency](examples/split-by-agency) for a complete example
84
101
 
85
102
  ## Development
86
103
 
87
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
104
+ ### Environment
105
+
106
+ This project manages its development environment with nix, specifically [devenv].
107
+
108
+ After checking out the repo:
88
109
 
89
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
110
+ - Install devenv: https://devenv.sh/getting-started/
111
+
112
+ - To enable the environment you can either:
113
+ - Use [direnv] to enable the environment as soon as you enter the project's path.
114
+ - Enable it only when you needed by running: `devenv shell`
115
+
116
+ - Run `bin/setup` to install the gem dependencies.
117
+
118
+ ### Tests
119
+
120
+ Run `rake spec` to run the tests.
121
+
122
+ ### REPL
123
+
124
+ You can also run `bin/console` for an interactive prompt that will allow you to experiment.
90
125
 
91
126
  ## Release process
92
127
 
93
128
  1. `bin/bump-version`
94
129
 
95
- - Bump the version in `lib/gtfs_df/version.rb`
96
- - Update the `CHANGELOG.md` using the git log since the last version
97
- - Create and push a new release branch with those changes
98
- - Create a PR for that release
130
+ - Bumps the version in `lib/gtfs_df/version.rb`
131
+ - Updates the `CHANGELOG.md` using the git log since the last version
132
+ - Creates and push a new release branch with those changes
133
+ - Creates a PR for that release
99
134
 
100
135
  2. `bin/create-tag`
101
136
 
102
137
  Creates and pushes the git tag for the release. That will trigger the GitHub action: `.github/workflows/publish.yml` to publish to RubyGems.
103
138
 
104
- ## TODO
105
-
106
- - [ ] Time parsing
107
-
108
- Just like partridge, we should parse Time as seconds since midnight. There's a draft in `lib/gtfs_df/utils.rb` but it's not used anywhere.
109
- I haven't figured out how to properly implement that with Polars.
110
-
111
139
  ## Contributing
112
140
 
113
141
  Bug reports and pull requests are welcome on GitHub at https://github.com/davidmh/ruby-gtfs_df.
@@ -120,3 +148,5 @@ The gem is available as open source under the terms of the [MIT License](https:/
120
148
  [Polars]: https://pola.rs/
121
149
  [ruby-polars]: https://github.com/ankane/ruby-polars
122
150
  [partridge]: https://github.com/remix/partridge
151
+ [devenv]: https://devenv.sh
152
+ [direnv]: https://direnv.net
@@ -1,20 +1,21 @@
1
1
  PATH
2
2
  remote: ../..
3
3
  specs:
4
- gtfs_df (0.7.0)
4
+ gtfs_df (0.8.0)
5
5
  networkx (~> 0.4)
6
6
  polars-df (~> 0.22)
7
- rubyzip (~> 2.3)
7
+ rubyzip (>= 2.3, < 4.0)
8
8
 
9
9
  GEM
10
10
  remote: https://gem.coop/
11
11
  specs:
12
- bigdecimal (3.3.1)
12
+ bigdecimal (4.0.1)
13
+ json (2.18.0)
13
14
  matrix (0.4.3)
14
15
  networkx (0.4.0)
15
16
  matrix (~> 0.4)
16
17
  rb_heap (~> 1.0)
17
- optparse (0.8.0)
18
+ optparse (0.8.1)
18
19
  polars-df (0.23.0-aarch64-linux)
19
20
  bigdecimal
20
21
  polars-df (0.23.0-aarch64-linux-musl)
@@ -28,11 +29,12 @@ GEM
28
29
  polars-df (0.23.0-x86_64-linux-musl)
29
30
  bigdecimal
30
31
  rb_heap (1.1.0)
31
- rubyzip (2.4.1)
32
+ rubyzip (3.2.2)
32
33
  unicode-display_width (3.2.0)
33
34
  unicode-emoji (~> 4.1)
34
- unicode-emoji (4.1.0)
35
- whirly (0.3.0)
35
+ unicode-emoji (4.2.0)
36
+ whirly (0.4.0)
37
+ json
36
38
  unicode-display_width (>= 1.1)
37
39
 
38
40
  PLATFORMS
@@ -15,6 +15,10 @@ module GtfsDf
15
15
  df = Polars.read_csv(input, infer_schema_length: 0, encoding: "utf8-lossy")
16
16
  .rename(->(col) { col.strip })
17
17
 
18
+ # Strip out empty lines. Unfortunately read_csv does not support the drop_empty_rows
19
+ # option right now.
20
+ df = df.filter(Polars.all_horizontal(Polars.all.is_null).is_not)
21
+
18
22
  dtypes = self.class::SCHEMA.slice(*df.columns)
19
23
  df
20
24
  .with_columns(dtypes.keys.map do |col|
data/lib/gtfs_df/feed.rb CHANGED
@@ -172,17 +172,17 @@ module GtfsDf
172
172
  filtered
173
173
  end
174
174
 
175
- # Traverses the grah to prune unreferenced entities from child dataframes
175
+ # Traverses the graph to prune unreferenced entities from child dataframes
176
176
  # based on parent relationships. See GtfsDf::Graph::STOP_NODES
177
177
  def prune!(root, filtered, filter_only_children: false)
178
178
  seen_edges = Set.new
179
- maybe_digraph = filter_only_children ? graph : graph.to_undirected
179
+ rerooted_graph = Graph.build(bidirectional: !filter_only_children)
180
180
 
181
181
  queue = [root]
182
182
 
183
183
  while queue.length > 0
184
184
  parent_node_id = queue.shift
185
- maybe_digraph.adj[parent_node_id].each do |child_node_id, attrs|
185
+ rerooted_graph.adj[parent_node_id].each do |child_node_id, attrs|
186
186
  edge = edge_id(parent_node_id, child_node_id)
187
187
 
188
188
  next if seen_edges.include?(edge)
@@ -209,6 +209,13 @@ module GtfsDf
209
209
 
210
210
  queue << child_node_id
211
211
 
212
+ # If the edge is weak (e.g. reverse edge of an optional relationship),
213
+ # we traverse to ensure connectivity but do NOT apply the filter.
214
+ if attrs[:type] == :weak
215
+ # puts "Skipping weak filter: #{edge}"
216
+ next
217
+ end
218
+
212
219
  attrs[:dependencies].each do |dep|
213
220
  parent_col = dep[parent_node_id]
214
221
  child_col = dep[child_node_id]
@@ -220,6 +227,13 @@ module GtfsDf
220
227
  # Get valid values from parent
221
228
  valid_values = parent_df[parent_col].to_a.uniq.compact
222
229
 
230
+ # Annoying special case to make sure that if we have a calendar with exceptions,
231
+ # the calendar_dates file doesn't end up pruning other files
232
+ if parent_node_id == "calendar_dates" && parent_col == "service_id" &&
233
+ filtered["calendar"]
234
+ valid_values = (valid_values + calendar["service_id"].to_a).uniq
235
+ end
236
+
223
237
  # Filter child to only include rows that reference valid parent values
224
238
  before = child_df.height
225
239
  filter = Polars.col(child_col).is_in(valid_values)
@@ -243,8 +257,7 @@ module GtfsDf
243
257
  end
244
258
 
245
259
  def edge_id(parent, child)
246
- # Alphabetize to make sure this works with undirected graph
247
- [parent, child].sort.join("-")
260
+ [parent, child].join("-")
248
261
  end
249
262
  end
250
263
  end
data/lib/gtfs_df/graph.rb CHANGED
@@ -41,7 +41,7 @@ module GtfsDf
41
41
  NODES = STANDARD_FILE_NODES.merge(STOP_NODES).freeze
42
42
 
43
43
  # Returns a directed graph of GTFS file dependencies
44
- def self.build
44
+ def self.build(bidirectional: false)
45
45
  g = NetworkX::DiGraph.new
46
46
  NODES.keys.each { |node| g.add_node(node) }
47
47
 
@@ -53,15 +53,15 @@ module GtfsDf
53
53
  ]}],
54
54
  ["agency", "fare_attributes", {dependencies: [
55
55
  {"fare_attributes" => "agency_id",
56
- "agency" => "agency_id"}
57
- ]}],
56
+ "agency" => "agency_id", :allow_null => true}
57
+ ], optional: true}],
58
58
  ["fare_attributes", "fare_rules", {dependencies: [
59
59
  {"fare_attributes" => "fare_id",
60
60
  "fare_rules" => "fare_id"}
61
61
  ]}],
62
62
  ["routes", "fare_rules", {dependencies: [
63
63
  {"fare_rules" => "route_id", "routes" => "route_id", :allow_null => true}
64
- ]}],
64
+ ], optional: true}],
65
65
  ["routes", "trips", {dependencies: [
66
66
  {"routes" => "route_id", "trips" => "route_id"}
67
67
  ]}],
@@ -73,12 +73,12 @@ module GtfsDf
73
73
  ]}],
74
74
  # Self-referential edge: stops can reference parent stations (location_type=1)
75
75
  ["parent_stations", "stops", {dependencies: [
76
- {"stops" => "parent_station", "parent_stations" => "stop_id"}
76
+ {"stops" => "parent_station", "parent_stations" => "stop_id", :allow_null => true}
77
77
  ]}],
78
78
  ["stops", "transfers", {dependencies: [
79
79
  {"stops" => "stop_id", "transfers" => "from_stop_id"},
80
80
  {"stops" => "stop_id", "transfers" => "to_stop_id"}
81
- ]}],
81
+ ], optional: true}],
82
82
  ["calendar", "trips", {dependencies: [
83
83
  {"trips" => "service_id", "calendar" => "service_id"}
84
84
  ]}],
@@ -86,11 +86,11 @@ module GtfsDf
86
86
  {"trips" => "service_id", "calendar_dates" => "service_id"}
87
87
  ]}],
88
88
  ["shapes", "trips", {dependencies: [
89
- {"trips" => "shape_id", "shapes" => "shape_id"}
89
+ {"trips" => "shape_id", "shapes" => "shape_id", :allow_null => true}
90
90
  ]}],
91
91
  ["trips", "frequencies", {dependencies: [
92
92
  {"trips" => "trip_id", "frequencies" => "trip_id"}
93
- ]}],
93
+ ], optional: true}],
94
94
 
95
95
  # --- GTFS Extensions ---
96
96
  ["stops", "fare_leg_join_rules",
@@ -163,6 +163,16 @@ module GtfsDf
163
163
 
164
164
  edges.each do |from, to, attrs|
165
165
  g.add_edge(from, to, **attrs)
166
+ if bidirectional
167
+ # When adding the reverse edge, if the relationship is optional (child is not required),
168
+ # mark the reverse edge as weak. This prevents empty child tables (e.g. fare_rules)
169
+ # from filtering parent tables (e.g. routes) into emptiness.
170
+ reverse_attrs = attrs.dup
171
+ if attrs[:optional]
172
+ reverse_attrs[:type] = :weak
173
+ end
174
+ g.add_edge(to, from, **reverse_attrs)
175
+ end
166
176
  end
167
177
  g
168
178
  end
@@ -14,8 +14,7 @@ module GtfsDf
14
14
  Zip::File.open(zip_path) do |zip_file|
15
15
  zip_file.each do |entry|
16
16
  next unless entry.file?
17
- out_path = File.join(tmpdir, entry.name)
18
- entry.extract(out_path)
17
+ entry.extract(destination_directory: tmpdir)
19
18
  end
20
19
  end
21
20
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module GtfsDf
4
- VERSION = "0.7.0"
4
+ VERSION = "0.8.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gtfs_df
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - David Mejorado
@@ -41,16 +41,22 @@ dependencies:
41
41
  name: rubyzip
42
42
  requirement: !ruby/object:Gem::Requirement
43
43
  requirements:
44
- - - "~>"
44
+ - - ">="
45
45
  - !ruby/object:Gem::Version
46
46
  version: '2.3'
47
+ - - "<"
48
+ - !ruby/object:Gem::Version
49
+ version: '4.0'
47
50
  type: :runtime
48
51
  prerelease: false
49
52
  version_requirements: !ruby/object:Gem::Requirement
50
53
  requirements:
51
- - - "~>"
54
+ - - ">="
52
55
  - !ruby/object:Gem::Version
53
56
  version: '2.3'
57
+ - - "<"
58
+ - !ruby/object:Gem::Version
59
+ version: '4.0'
54
60
  description: 'A Ruby gem to load, filter, and manipulate GTFS (General Transit Feed
55
61
  Specification) feeds using DataFrames powered by Polars. Supports cascading filters
56
62
  that maintain referential integrity across related tables. NOTE: This gem is not
@@ -137,7 +143,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
137
143
  requirements:
138
144
  - - ">="
139
145
  - !ruby/object:Gem::Version
140
- version: 3.1.0
146
+ version: 3.2.0
141
147
  required_rubygems_version: !ruby/object:Gem::Requirement
142
148
  requirements:
143
149
  - - ">="