gtfs_df 0.5.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 140458a6ce1013bef475e0a6cdcab6364cff04b8a18eedb5e5d0244e3bccf38a
4
- data.tar.gz: c420a34f7004eca9267f32f53038632f822224924eac5b77aa98957bb3149e20
3
+ metadata.gz: 9db3a2b1829e87ae837133265aa0192585dc7fb7476646e634887652c7771e02
4
+ data.tar.gz: c8b5a84c802897dfcd291acd45deaee4798fa34c8de71ac5436a00fe9025fe19
5
5
  SHA512:
6
- metadata.gz: 032d24ed1df3ed43e5e6953abebbeda70ba5450cab731e931e8548bd058e37d9472edd688a78b0c941e8ff995111b9f616c8aae23e7daba9f3a610813aade528
7
- data.tar.gz: b808c05aeedea83faf728feded28a38a78ac7a1d2ff139ea36cb0f474f310228792f926a97160df637f9d85dca83272921ed9dff72a632c4aab575163643610d
6
+ metadata.gz: 0d6f0d3af6c79ab6bb3b7e1b19aa14bf531665b19284f42b5e0b3658e2686314a8f4875d04cac1ca66c02597b8ff50455bfdf55bcd8093fd1171dd02651926c8
7
+ data.tar.gz: 9d105caf14ef7ad8597910efdf8786d8238ef3b5e440651dc01d00f321f1face713c959bc367a21b394f97b32dc4a86958a76ce68561824bd665321475d0b4cf
data/CHANGELOG.md CHANGED
@@ -1,55 +1,141 @@
1
+ ## [0.6.1] - 2025-12-12
2
+
3
+ ### 🐛 Bug Fixes
4
+
5
+ - Parse whitespace in column headers
6
+
7
+ ### 📚 Documentation
8
+
9
+ - Badges
10
+
11
+ ### ⚙️ Miscellaneous Tasks
12
+
13
+ - Update devenv
14
+ - Drop custom changelog parsing
15
+ ## [0.6.0] - 2025-12-10
16
+
17
+ ### 🐛 Bug Fixes
18
+
19
+ - Visit nodes multiple times
20
+
21
+ ### ⚙️ Miscellaneous Tasks
22
+
23
+ - Bump version to 0.6.0
1
24
  ## [0.5.0] - 2025-12-08
2
25
 
3
- ### Added
26
+ ### 🚀 Features
4
27
 
5
- - add Feed#filter filter_only_children param
28
+ - [**breaking**] Add Feed#filter filter_only_children param
6
29
 
7
- ### Maintenance
30
+ ### ⚙️ Miscellaneous Tasks
8
31
 
9
- - arrange edges so parent is always first
10
- - build directed graph
11
- - allow ! in commit messages
32
+ - Arrange edges so parent is always first
33
+ - Build directed graph
34
+ - Allow ! in commit messages
35
+ - Bump version to 0.5.0
12
36
  ## [0.4.1] - 2025-12-05
13
37
 
14
- ### Added
38
+ ### 🚀 Features
15
39
 
16
- - handle extra whitespace in csvs
40
+ - Handle extra whitespace in csvs
17
41
 
18
- ### Maintenance
42
+ ### ⚙️ Miscellaneous Tasks
19
43
 
20
- - remove unused initializer format
44
+ - Remove unreleased section
45
+ - Remove unused initializer format
46
+ - Bump version to 0.4.1
21
47
  ## [0.4.0] - 2025-12-04
22
48
 
23
- ### Added
49
+ ### 🚀 Features
24
50
 
25
- - allow setting maintain_trip_dependencies=false
51
+ - Allow setting maintain_trip_dependencies=false
26
52
 
27
- ### Fixed
53
+ ### 🐛 Bug Fixes
28
54
 
29
- - parse stop_lat as float
30
- - add missing agency -> fare_attributes edge
31
- - allow null for fare_rules
55
+ - Parse stop_lat as float
56
+ - Add missing agency -> fare_attributes edge
57
+ - Allow null for fare_rules
32
58
 
33
- ### Maintenance
34
-
35
- - provide accessor for gtfs_files (utility)
36
- - add yard docs
59
+ ### ⚙️ Miscellaneous Tasks
37
60
 
61
+ - Provide accessor for gtfs_files (utility)
62
+ - Add yard docs
63
+ - Bump version to 0.4.0
38
64
  ## [0.3.0] - 2025-12-04
39
65
 
40
- ### Added
66
+ ### 🚀 Features
67
+
68
+ - Keep parent stations linked to used stops
69
+
70
+ ### 🐛 Bug Fixes
71
+
72
+ - Handle null values
73
+ - Update lock on version bump
74
+
75
+ ### ⚙️ Miscellaneous Tasks
76
+
77
+ - Reuse load_from_dir logic in reader
78
+ - Clean up unused method + better comments
79
+ - Autopublish on release tag push
80
+ - Automate release script
81
+ - Release tag script
82
+ - Bump version to 0.3.0
83
+ ## [0.2.0] - 2025-12-01
84
+
85
+ ### 🚀 Features
86
+
87
+ - Add Reader.load_from_dir
88
+
89
+ ### 🐛 Bug Fixes
90
+
91
+ - Require correct entrypoint
92
+ - Cascade empty view filters
93
+ - Handle parsing when cols size = schema size
94
+ - Parse extraneous columns as strings
95
+ - Cascade changes reliably
96
+ - Filter with trips as atomic unit
97
+ - Remove nonexistent booking_rule association
98
+ - Add empty string to null vals
99
+
100
+ ### 📚 Documentation
101
+
102
+ - Include processing time
103
+ - Update gem name
104
+
105
+ ### ⚙️ Miscellaneous Tasks
106
+
107
+ - Add byebug gem
108
+ - Include byebug in spec_helper.rb
109
+ - Rearrange filter specs
110
+ - Add pending specs for expected behaviors
111
+ - [**breaking**] Removes duplicate load_from_dir method (use reader instead)
112
+ - Mutate for both filter! and prune!
113
+ - Tag version 0.2.0
114
+ ## [0.1.1] - 2025-11-12
115
+
116
+ ### 🐛 Bug Fixes
117
+
118
+ - Release workflow
41
119
 
42
- - keep parent stations linked to used stops
120
+ ### ⚙️ Miscellaneous Tasks
43
121
 
44
- ### Fixed
122
+ - Rename namespace to follow ruby conventions
123
+ - Bump version
124
+ - Remove broken release flow
125
+ - Clarify gem status
126
+ - Republish version
127
+ ## [0.1.0] - 2025-11-12
45
128
 
46
- - handle null values
47
- - update lock on version bump
129
+ ### 📚 Documentation
48
130
 
49
- ### Maintenance
131
+ - Readme and gemspec details
132
+ - Time parsing to-do
50
133
 
51
- - reuse load_from_dir logic in reader
52
- - clean up unused method + better comments
53
- ## [0.1.0] - 2025-11-10
134
+ ### ⚙️ Miscellaneous Tasks
54
135
 
136
+ - Initial commit
137
+ - Make the lock platform agnostic
138
+ - Validate commit messages
139
+ - Run spec and standard steps separately
140
+ - Release flow
55
141
  - Initial release
data/README.md CHANGED
@@ -1,5 +1,7 @@
1
1
  # ruby-gtfs-df
2
2
 
3
+ [![Tests](https://github.com/davidmh/ruby-gtfs-df/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/davidmh/ruby-gtfs-df/actions/workflows/tests.yml) [![Gem Version](https://badge.fury.io/rb/gtfs_df.svg)](https://badge.fury.io/rb/gtfs_df)
4
+
3
5
  A ruby gem to manipulate [GTFS] feeds using DataFrames using [Polars] ([ruby-polars])
4
6
 
5
7
  This project was created to bring the power of [partridge] to ruby.
data/cliff.toml ADDED
@@ -0,0 +1,92 @@
1
+ # git-cliff ~ configuration file
2
+ # https://git-cliff.org/docs/configuration
3
+
4
+
5
+ [changelog]
6
+ # A Tera template to be rendered for each release in the changelog.
7
+ # See https://keats.github.io/tera/docs/#introduction
8
+ body = """
9
+ {% if version %}\
10
+ ## [{{ version | trim_start_matches(pat="v") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
11
+ {% else %}\
12
+ ## [unreleased]
13
+ {% endif %}\
14
+ {% for group, commits in commits | group_by(attribute="group") %}
15
+ ### {{ group | striptags | trim | upper_first }}
16
+ {% for commit in commits %}
17
+ - {% if commit.scope %}*({{ commit.scope }})* {% endif %}\
18
+ {% if commit.breaking %}[**breaking**] {% endif %}\
19
+ {{ commit.message | upper_first }}\
20
+ {% endfor %}
21
+ {% endfor %}
22
+ """
23
+ # Remove leading and trailing whitespaces from the changelog's body.
24
+ trim = true
25
+ # Render body even when there are no releases to process.
26
+ render_always = true
27
+ # An array of regex based postprocessors to modify the changelog.
28
+ postprocessors = [
29
+ # Replace the placeholder <REPO> with a URL.
30
+ #{ pattern = '<REPO>', replace = "https://github.com/orhun/git-cliff" },
31
+ ]
32
+ # render body even when there are no releases to process
33
+ # render_always = true
34
+ # output file path
35
+ # output = "test.md"
36
+
37
+ [git]
38
+ # Parse commits according to the conventional commits specification.
39
+ # See https://www.conventionalcommits.org
40
+ conventional_commits = true
41
+ # Exclude commits that do not match the conventional commits specification.
42
+ filter_unconventional = true
43
+ # Require all commits to be conventional.
44
+ # Takes precedence over filter_unconventional.
45
+ require_conventional = false
46
+ # Split commits on newlines, treating each line as an individual commit.
47
+ split_commits = false
48
+ # An array of regex based parsers to modify commit messages prior to further processing.
49
+ commit_preprocessors = [
50
+ # Replace issue numbers with link templates to be updated in `changelog.postprocessors`.
51
+ #{ pattern = '\((\w+\s)?#([0-9]+)\)', replace = "([#${2}](<REPO>/issues/${2}))"},
52
+ # Check spelling of the commit message using https://github.com/crate-ci/typos.
53
+ # If the spelling is incorrect, it will be fixed automatically.
54
+ #{ pattern = '.*', replace_command = 'typos --write-changes -' },
55
+ ]
56
+ # Prevent commits that are breaking from being excluded by commit parsers.
57
+ protect_breaking_commits = false
58
+ # An array of regex based parsers for extracting data from the commit message.
59
+ # Assigns commits to groups.
60
+ # Optionally sets the commit's scope and can decide to exclude commits from further processing.
61
+ commit_parsers = [
62
+ { message = "^feat", group = "<!-- 0 -->🚀 Features" },
63
+ { message = "^fix", group = "<!-- 1 -->🐛 Bug Fixes" },
64
+ { message = "^doc", group = "<!-- 3 -->📚 Documentation" },
65
+ { message = "^perf", group = "<!-- 4 -->⚡ Performance" },
66
+ { message = "^refactor", group = "<!-- 2 -->🚜 Refactor" },
67
+ { message = "^style", group = "<!-- 5 -->🎨 Styling" },
68
+ { message = "^test", group = "<!-- 6 -->🧪 Testing" },
69
+ { message = "^chore\\(release\\): prepare for", skip = true },
70
+ { message = "^chore\\(deps.*\\)", skip = true },
71
+ { message = "^chore\\(pr\\)", skip = true },
72
+ { message = "^chore\\(pull\\)", skip = true },
73
+ { message = "^chore|^ci", group = "<!-- 7 -->⚙️ Miscellaneous Tasks" },
74
+ { body = ".*security", group = "<!-- 8 -->🛡️ Security" },
75
+ { message = "^revert", group = "<!-- 9 -->◀️ Revert" },
76
+ { message = ".*", group = "<!-- 10 -->💼 Other" },
77
+ ]
78
+ # Exclude commits that are not matched by any commit parser.
79
+ filter_commits = false
80
+ # An array of link parsers for extracting external references, and turning them into URLs, using regex.
81
+ link_parsers = []
82
+ # Include only the tags that belong to the current branch.
83
+ use_branch_tags = false
84
+ # Order releases topologically instead of chronologically.
85
+ topo_order = false
86
+ # Order releases topologically instead of chronologically.
87
+ topo_order_commits = true
88
+ # Order of commits in each group/release within the changelog.
89
+ # Allowed values: newest, oldest
90
+ sort_commits = "oldest"
91
+ # Process submodules commits
92
+ recurse_submodules = false
data/devenv.lock CHANGED
@@ -3,10 +3,10 @@
3
3
  "devenv": {
4
4
  "locked": {
5
5
  "dir": "src/modules",
6
- "lastModified": 1761427990,
6
+ "lastModified": 1765397744,
7
7
  "owner": "cachix",
8
8
  "repo": "devenv",
9
- "rev": "7419c04fc798d5d5918413d4cb6c8629f9d4e8a3",
9
+ "rev": "fd121886248781fee8546e4b6edc6d27e1d4efd5",
10
10
  "type": "github"
11
11
  },
12
12
  "original": {
@@ -19,10 +19,10 @@
19
19
  "flake-compat": {
20
20
  "flake": false,
21
21
  "locked": {
22
- "lastModified": 1747046372,
22
+ "lastModified": 1765121682,
23
23
  "owner": "edolstra",
24
24
  "repo": "flake-compat",
25
- "rev": "9100a0f413b0c601e0533d1d94ffd501ce2e7885",
25
+ "rev": "65f23138d8d09a92e30f1e5c87611b23ef451bf3",
26
26
  "type": "github"
27
27
  },
28
28
  "original": {
@@ -34,10 +34,10 @@
34
34
  "flake-compat_2": {
35
35
  "flake": false,
36
36
  "locked": {
37
- "lastModified": 1747046372,
37
+ "lastModified": 1765121682,
38
38
  "owner": "edolstra",
39
39
  "repo": "flake-compat",
40
- "rev": "9100a0f413b0c601e0533d1d94ffd501ce2e7885",
40
+ "rev": "65f23138d8d09a92e30f1e5c87611b23ef451bf3",
41
41
  "type": "github"
42
42
  },
43
43
  "original": {
@@ -72,10 +72,10 @@
72
72
  ]
73
73
  },
74
74
  "locked": {
75
- "lastModified": 1760663237,
75
+ "lastModified": 1765404074,
76
76
  "owner": "cachix",
77
77
  "repo": "git-hooks.nix",
78
- "rev": "ca5b894d3e3e151ffc1db040b6ce4dcc75d31c37",
78
+ "rev": "2d6f58930fbcd82f6f9fd59fb6d13e37684ca529",
79
79
  "type": "github"
80
80
  },
81
81
  "original": {
@@ -92,10 +92,10 @@
92
92
  ]
93
93
  },
94
94
  "locked": {
95
- "lastModified": 1709087332,
95
+ "lastModified": 1762808025,
96
96
  "owner": "hercules-ci",
97
97
  "repo": "gitignore.nix",
98
- "rev": "637db329424fd7e46cf4185293b9cc8c88c95394",
98
+ "rev": "cb5e3fdca1de58ccbc3ef53de65bd372b48f567c",
99
99
  "type": "github"
100
100
  },
101
101
  "original": {
@@ -106,10 +106,10 @@
106
106
  },
107
107
  "nixpkgs": {
108
108
  "locked": {
109
- "lastModified": 1758532697,
109
+ "lastModified": 1764580874,
110
110
  "owner": "cachix",
111
111
  "repo": "devenv-nixpkgs",
112
- "rev": "207a4cb0e1253c7658c6736becc6eb9cace1f25f",
112
+ "rev": "dcf61356c3ab25f1362b4a4428a6d871e84f1d1d",
113
113
  "type": "github"
114
114
  },
115
115
  "original": {
@@ -128,10 +128,10 @@
128
128
  ]
129
129
  },
130
130
  "locked": {
131
- "lastModified": 1759902829,
131
+ "lastModified": 1765345660,
132
132
  "owner": "bobvanderlinden",
133
133
  "repo": "nixpkgs-ruby",
134
- "rev": "5fba6c022a63f1e76dee4da71edddad8959f088a",
134
+ "rev": "35e7b0919db2859f4022d1aa00b9c01253be1bce",
135
135
  "type": "github"
136
136
  },
137
137
  "original": {
data/devenv.nix CHANGED
@@ -7,7 +7,9 @@
7
7
  bundler.enable = false;
8
8
  };
9
9
 
10
- pre-commit.hooks = {
10
+ git-hooks.hooks = {
11
11
  conform.enable = true;
12
12
  };
13
+
14
+ packages = with pkgs; [ git-cliff ];
13
15
  }
@@ -13,8 +13,9 @@ module GtfsDf
13
13
  # TODO: use `infer_schema: false` instead of `infer_schema_length` after polars release:
14
14
  # https://github.com/ankane/ruby-polars/blob/master/CHANGELOG.md#100-unreleased
15
15
  df = Polars.read_csv(input, infer_schema_length: 0)
16
- dtypes = self.class::SCHEMA.slice(*df.columns)
16
+ .rename(->(col) { col.strip })
17
17
 
18
+ dtypes = self.class::SCHEMA.slice(*df.columns)
18
19
  df
19
20
  .with_columns(dtypes.keys.map do |col|
20
21
  stripped = Polars.col(col).str.strip
data/lib/gtfs_df/feed.rb CHANGED
@@ -157,59 +157,76 @@ module GtfsDf
157
157
  # Traverses the grah to prune unreferenced entities from child dataframes
158
158
  # based on parent relationships. See GtfsDf::Graph::STOP_NODES
159
159
  def prune!(root, filtered, filter_only_children: false)
160
+ seen_edges = Set.new
160
161
  maybe_digraph = filter_only_children ? graph : graph.to_undirected
161
- maybe_digraph.each_bfs_edge(root) do |parent_node_id, child_node_id|
162
- parent_node = Graph::NODES[parent_node_id]
163
- child_node = Graph::NODES[child_node_id]
164
- parent_df = filtered[parent_node.fetch(:file)]
165
- next unless parent_df
166
-
167
- child_df = filtered[child_node.fetch(:file)]
168
- # Certain nodes are pre-filtered because they reference only
169
- # a piece of the dataframe
170
- filter_attrs = child_node[:filter_attrs]
171
- if filter_attrs && child_df.columns.include?(filter_attrs.fetch(:filter_col))
172
- filter = filter_attrs.fetch(:filter)
173
- # Temporarily remove rows that do not match node filter criteria to process them
174
- # separately (e.g., when filtering stops, parent stations that should be preserved
175
- # regardless of direct references)
176
- saved_vals = child_df.filter(filter.is_not)
177
- child_df = child_df.filter(filter)
178
- end
179
- next unless child_df && child_df.height > 0
180
-
181
- attrs = maybe_digraph.get_edge_data(parent_node_id, child_node_id)
182
-
183
- attrs[:dependencies].each do |dep|
184
- parent_col = dep[parent_node_id]
185
- child_col = dep[child_node_id]
186
- allow_null = !!dep[:allow_null]
187
-
188
- next unless parent_col && child_col &&
189
- parent_df.columns.include?(parent_col) && child_df.columns.include?(child_col)
190
-
191
- # Get valid values from parent
192
- valid_values = parent_df[parent_col].to_a.uniq.compact
193
162
 
194
- # Filter child to only include rows that reference valid parent values
195
- before = child_df.height
196
- filter = Polars.col(child_col).is_in(valid_values)
197
- if allow_null
198
- filter = (filter | Polars.col(child_col).is_null)
163
+ queue = [root]
164
+
165
+ while queue.length > 0
166
+ parent_node_id = queue.shift
167
+ maybe_digraph.adj[parent_node_id].each do |child_node_id, attrs|
168
+ edge = edge_id(parent_node_id, child_node_id)
169
+
170
+ next if seen_edges.include?(edge)
171
+ seen_edges.add(edge)
172
+
173
+ parent_node = Graph::NODES[parent_node_id]
174
+ child_node = Graph::NODES[child_node_id]
175
+ parent_df = filtered[parent_node.fetch(:file)]
176
+ next unless parent_df
177
+
178
+ child_df = filtered[child_node.fetch(:file)]
179
+ # Certain nodes are pre-filtered because they reference only
180
+ # a piece of the dataframe
181
+ filter_attrs = child_node[:filter_attrs]
182
+ if filter_attrs && child_df.columns.include?(filter_attrs.fetch(:filter_col))
183
+ filter = filter_attrs.fetch(:filter)
184
+ # Temporarily remove rows that do not match node filter criteria to process them
185
+ # separately (e.g., when filtering stops, parent stations that should be preserved
186
+ # regardless of direct references)
187
+ saved_vals = child_df.filter(filter.is_not)
188
+ child_df = child_df.filter(filter)
199
189
  end
200
- child_df = child_df.filter(filter)
201
- changed = child_df.height < before
202
-
203
- # If we removed a part of the child_df earlier, concat it back on
204
- if saved_vals
205
- child_df = Polars.concat([child_df, saved_vals], how: "vertical")
206
- end
207
-
208
- if changed
209
- filtered[child_node.fetch(:file)] = child_df
190
+ next unless child_df && child_df.height > 0
191
+
192
+ queue << child_node_id
193
+
194
+ attrs[:dependencies].each do |dep|
195
+ parent_col = dep[parent_node_id]
196
+ child_col = dep[child_node_id]
197
+ allow_null = !!dep[:allow_null]
198
+
199
+ next unless parent_col && child_col &&
200
+ parent_df.columns.include?(parent_col) && child_df.columns.include?(child_col)
201
+
202
+ # Get valid values from parent
203
+ valid_values = parent_df[parent_col].to_a.uniq.compact
204
+
205
+ # Filter child to only include rows that reference valid parent values
206
+ before = child_df.height
207
+ filter = Polars.col(child_col).is_in(valid_values)
208
+ if allow_null
209
+ filter = (filter | Polars.col(child_col).is_null)
210
+ end
211
+ child_df = child_df.filter(filter)
212
+ changed = child_df.height < before
213
+
214
+ # If we removed a part of the child_df earlier, concat it back on
215
+ if saved_vals
216
+ child_df = Polars.concat([child_df, saved_vals], how: "vertical")
217
+ end
218
+
219
+ if changed
220
+ filtered[child_node.fetch(:file)] = child_df
221
+ end
210
222
  end
211
223
  end
212
224
  end
213
225
  end
226
+
227
+ def edge_id(parent, child)
228
+ # Alphabetize to make sure this works with undirected graph
229
+ [parent, child].sort.join("-")
230
+ end
214
231
  end
215
232
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module GtfsDf
4
- VERSION = "0.5.0"
4
+ VERSION = "0.6.1"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gtfs_df
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - David Mejorado
@@ -71,6 +71,7 @@ files:
71
71
  - LICENSE.txt
72
72
  - README.md
73
73
  - Rakefile
74
+ - cliff.toml
74
75
  - devenv.lock
75
76
  - devenv.nix
76
77
  - devenv.yaml