RubyGems - gtfs_df - Versions diffs - 0.5.0 → 0.6.1 - Mend

gtfs_df 0.5.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +115 -29
data/README.md +2 -0
data/cliff.toml +92 -0
data/devenv.lock +14 -14
data/devenv.nix +3 -1
data/lib/gtfs_df/base_gtfs_table.rb +2 -1
data/lib/gtfs_df/feed.rb +64 -47
data/lib/gtfs_df/version.rb +1 -1
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 140458a6ce1013bef475e0a6cdcab6364cff04b8a18eedb5e5d0244e3bccf38a
-  data.tar.gz: c420a34f7004eca9267f32f53038632f822224924eac5b77aa98957bb3149e20
+  metadata.gz: 9db3a2b1829e87ae837133265aa0192585dc7fb7476646e634887652c7771e02
+  data.tar.gz: c8b5a84c802897dfcd291acd45deaee4798fa34c8de71ac5436a00fe9025fe19
 SHA512:
-  metadata.gz: 032d24ed1df3ed43e5e6953abebbeda70ba5450cab731e931e8548bd058e37d9472edd688a78b0c941e8ff995111b9f616c8aae23e7daba9f3a610813aade528
-  data.tar.gz: b808c05aeedea83faf728feded28a38a78ac7a1d2ff139ea36cb0f474f310228792f926a97160df637f9d85dca83272921ed9dff72a632c4aab575163643610d
+  metadata.gz: 0d6f0d3af6c79ab6bb3b7e1b19aa14bf531665b19284f42b5e0b3658e2686314a8f4875d04cac1ca66c02597b8ff50455bfdf55bcd8093fd1171dd02651926c8
+  data.tar.gz: 9d105caf14ef7ad8597910efdf8786d8238ef3b5e440651dc01d00f321f1face713c959bc367a21b394f97b32dc4a86958a76ce68561824bd665321475d0b4cf

data/CHANGELOG.md CHANGED Viewed

@@ -1,55 +1,141 @@
+## [0.6.1] - 2025-12-12
+### 🐛 Bug Fixes
+- Parse whitespace in column headers
+### 📚 Documentation
+- Badges
+### ⚙️ Miscellaneous Tasks
+- Update devenv
+- Drop custom changelog parsing
+## [0.6.0] - 2025-12-10
+### 🐛 Bug Fixes
+- Visit nodes multiple times
+### ⚙️ Miscellaneous Tasks
+- Bump version to 0.6.0
 ## [0.5.0] - 2025-12-08
-### Added
+### 🚀 Features
-- add Feed#filter filter_only_children param
+- [**breaking**] Add Feed#filter filter_only_children param
-### Maintenance
+### ⚙️ Miscellaneous Tasks
-- arrange edges so parent is always first
-- build directed graph
-- allow ! in commit messages
+- Arrange edges so parent is always first
+- Build directed graph
+- Allow ! in commit messages
+- Bump version to 0.5.0
 ## [0.4.1] - 2025-12-05
-### Added
+### 🚀 Features
-- handle extra whitespace in csvs
+- Handle extra whitespace in csvs
-### Maintenance
+### ⚙️ Miscellaneous Tasks
-- remove unused initializer format
+- Remove unreleased section
+- Remove unused initializer format
+- Bump version to 0.4.1
 ## [0.4.0] - 2025-12-04
-### Added
+### 🚀 Features
-- allow setting maintain_trip_dependencies=false
+- Allow setting maintain_trip_dependencies=false
-### Fixed
+### 🐛 Bug Fixes
-- parse stop_lat as float
-- add missing agency -> fare_attributes edge
-- allow null for fare_rules
+- Parse stop_lat as float
+- Add missing agency -> fare_attributes edge
+- Allow null for fare_rules
-### Maintenance
-- provide accessor for gtfs_files (utility)
-- add yard docs
+### ⚙️ Miscellaneous Tasks
+- Provide accessor for gtfs_files (utility)
+- Add yard docs
+- Bump version to 0.4.0
 ## [0.3.0] - 2025-12-04
-### Added
+### 🚀 Features
+- Keep parent stations linked to used stops
+### 🐛 Bug Fixes
+- Handle null values
+- Update lock on version bump
+### ⚙️ Miscellaneous Tasks
+- Reuse load_from_dir logic in reader
+- Clean up unused method + better comments
+- Autopublish on release tag push
+- Automate release script
+- Release tag script
+- Bump version to 0.3.0
+## [0.2.0] - 2025-12-01
+### 🚀 Features
+- Add Reader.load_from_dir
+### 🐛 Bug Fixes
+- Require correct entrypoint
+- Cascade empty view filters
+- Handle parsing when cols size = schema size
+- Parse extraneous columns as strings
+- Cascade changes reliably
+- Filter with trips as atomic unit
+- Remove nonexistent booking_rule association
+- Add empty string to null vals
+### 📚 Documentation
+- Include processing time
+- Update gem name
+### ⚙️ Miscellaneous Tasks
+- Add byebug gem
+- Include byebug in spec_helper.rb
+- Rearrange filter specs
+- Add pending specs for expected behaviors
+- [**breaking**] Removes duplicate load_from_dir method (use reader instead)
+- Mutate for both filter! and prune!
+- Tag version 0.2.0
+## [0.1.1] - 2025-11-12
+### 🐛 Bug Fixes
+- Release workflow
-- keep parent stations linked to used stops
+### ⚙️ Miscellaneous Tasks
-### Fixed
+- Rename namespace to follow ruby conventions
+- Bump version
+- Remove broken release flow
+- Clarify gem status
+- Republish version
+## [0.1.0] - 2025-11-12
-- handle null values
-- update lock on version bump
+### 📚 Documentation
-### Maintenance
+- Readme and gemspec details
+- Time parsing to-do
-- reuse load_from_dir logic in reader
-- clean up unused method + better comments
-## [0.1.0] - 2025-11-10
+### ⚙️ Miscellaneous Tasks
+- Initial commit
+- Make the lock platform agnostic
+- Validate commit messages
+- Run spec and standard steps separately
+- Release flow
 - Initial release

data/README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # ruby-gtfs-df
+[![Tests](https://github.com/davidmh/ruby-gtfs-df/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/davidmh/ruby-gtfs-df/actions/workflows/tests.yml) [![Gem Version](https://badge.fury.io/rb/gtfs_df.svg)](https://badge.fury.io/rb/gtfs_df)
 A ruby gem to manipulate [GTFS] feeds using DataFrames using [Polars] ([ruby-polars])
 This project was created to bring the power of [partridge] to ruby.

data/cliff.toml ADDED Viewed

@@ -0,0 +1,92 @@
+# git-cliff ~ configuration file
+# https://git-cliff.org/docs/configuration
+[changelog]
+# A Tera template to be rendered for each release in the changelog.
+# See https://keats.github.io/tera/docs/#introduction
+body = """
+{% if version %}\
+    ## [{{ version | trim_start_matches(pat="v") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
+{% else %}\
+    ## [unreleased]
+{% endif %}\
+{% for group, commits in commits | group_by(attribute="group") %}
+    ### {{ group | striptags | trim | upper_first }}
+    {% for commit in commits %}
+        - {% if commit.scope %}*({{ commit.scope }})* {% endif %}\
+            {% if commit.breaking %}[**breaking**] {% endif %}\
+            {{ commit.message | upper_first }}\
+    {% endfor %}
+{% endfor %}
+"""
+# Remove leading and trailing whitespaces from the changelog's body.
+trim = true
+# Render body even when there are no releases to process.
+render_always = true
+# An array of regex based postprocessors to modify the changelog.
+postprocessors = [
+    # Replace the placeholder <REPO> with a URL.
+    #{ pattern = '<REPO>', replace = "https://github.com/orhun/git-cliff" },
+]
+# render body even when there are no releases to process
+# render_always = true
+# output file path
+# output = "test.md"
+[git]
+# Parse commits according to the conventional commits specification.
+# See https://www.conventionalcommits.org
+conventional_commits = true
+# Exclude commits that do not match the conventional commits specification.
+filter_unconventional = true
+# Require all commits to be conventional.
+# Takes precedence over filter_unconventional.
+require_conventional = false
+# Split commits on newlines, treating each line as an individual commit.
+split_commits = false
+# An array of regex based parsers to modify commit messages prior to further processing.
+commit_preprocessors = [
+    # Replace issue numbers with link templates to be updated in `changelog.postprocessors`.
+    #{ pattern = '\((\w+\s)?#([0-9]+)\)', replace = "([#${2}](<REPO>/issues/${2}))"},
+    # Check spelling of the commit message using https://github.com/crate-ci/typos.
+    # If the spelling is incorrect, it will be fixed automatically.
+    #{ pattern = '.*', replace_command = 'typos --write-changes -' },
+]
+# Prevent commits that are breaking from being excluded by commit parsers.
+protect_breaking_commits = false
+# An array of regex based parsers for extracting data from the commit message.
+# Assigns commits to groups.
+# Optionally sets the commit's scope and can decide to exclude commits from further processing.
+commit_parsers = [
+    { message = "^feat", group = "<!-- 0 -->🚀 Features" },
+    { message = "^fix", group = "<!-- 1 -->🐛 Bug Fixes" },
+    { message = "^doc", group = "<!-- 3 -->📚 Documentation" },
+    { message = "^perf", group = "<!-- 4 -->⚡ Performance" },
+    { message = "^refactor", group = "<!-- 2 -->🚜 Refactor" },
+    { message = "^style", group = "<!-- 5 -->🎨 Styling" },
+    { message = "^test", group = "<!-- 6 -->🧪 Testing" },
+    { message = "^chore\\(release\\): prepare for", skip = true },
+    { message = "^chore\\(deps.*\\)", skip = true },
+    { message = "^chore\\(pr\\)", skip = true },
+    { message = "^chore\\(pull\\)", skip = true },
+    { message = "^chore|^ci", group = "<!-- 7 -->⚙️ Miscellaneous Tasks" },
+    { body = ".*security", group = "<!-- 8 -->🛡️ Security" },
+    { message = "^revert", group = "<!-- 9 -->◀️ Revert" },
+    { message = ".*", group = "<!-- 10 -->💼 Other" },
+]
+# Exclude commits that are not matched by any commit parser.
+filter_commits = false
+# An array of link parsers for extracting external references, and turning them into URLs, using regex.
+link_parsers = []
+# Include only the tags that belong to the current branch.
+use_branch_tags = false
+# Order releases topologically instead of chronologically.
+topo_order = false
+# Order releases topologically instead of chronologically.
+topo_order_commits = true
+# Order of commits in each group/release within the changelog.
+# Allowed values: newest, oldest
+sort_commits = "oldest"
+# Process submodules commits
+recurse_submodules = false

data/devenv.lock CHANGED Viewed

@@ -3,10 +3,10 @@
     "devenv": {
       "locked": {
         "dir": "src/modules",
-        "lastModified": 1761427990,
+        "lastModified": 1765397744,
         "owner": "cachix",
         "repo": "devenv",
-        "rev": "7419c04fc798d5d5918413d4cb6c8629f9d4e8a3",
+        "rev": "fd121886248781fee8546e4b6edc6d27e1d4efd5",
         "type": "github"
       },
       "original": {
@@ -19,10 +19,10 @@
     "flake-compat": {
       "flake": false,
       "locked": {
-        "lastModified": 1747046372,
+        "lastModified": 1765121682,
         "owner": "edolstra",
         "repo": "flake-compat",
-        "rev": "9100a0f413b0c601e0533d1d94ffd501ce2e7885",
+        "rev": "65f23138d8d09a92e30f1e5c87611b23ef451bf3",
         "type": "github"
       },
       "original": {
@@ -34,10 +34,10 @@
     "flake-compat_2": {
       "flake": false,
       "locked": {
-        "lastModified": 1747046372,
+        "lastModified": 1765121682,
         "owner": "edolstra",
         "repo": "flake-compat",
-        "rev": "9100a0f413b0c601e0533d1d94ffd501ce2e7885",
+        "rev": "65f23138d8d09a92e30f1e5c87611b23ef451bf3",
         "type": "github"
       },
       "original": {
@@ -72,10 +72,10 @@
         ]
       },
       "locked": {
-        "lastModified": 1760663237,
+        "lastModified": 1765404074,
         "owner": "cachix",
         "repo": "git-hooks.nix",
-        "rev": "ca5b894d3e3e151ffc1db040b6ce4dcc75d31c37",
+        "rev": "2d6f58930fbcd82f6f9fd59fb6d13e37684ca529",
         "type": "github"
       },
       "original": {
@@ -92,10 +92,10 @@
         ]
       },
       "locked": {
-        "lastModified": 1709087332,
+        "lastModified": 1762808025,
         "owner": "hercules-ci",
         "repo": "gitignore.nix",
-        "rev": "637db329424fd7e46cf4185293b9cc8c88c95394",
+        "rev": "cb5e3fdca1de58ccbc3ef53de65bd372b48f567c",
         "type": "github"
       },
       "original": {
@@ -106,10 +106,10 @@
     },
     "nixpkgs": {
       "locked": {
-        "lastModified": 1758532697,
+        "lastModified": 1764580874,
         "owner": "cachix",
         "repo": "devenv-nixpkgs",
-        "rev": "207a4cb0e1253c7658c6736becc6eb9cace1f25f",
+        "rev": "dcf61356c3ab25f1362b4a4428a6d871e84f1d1d",
         "type": "github"
       },
       "original": {
@@ -128,10 +128,10 @@
         ]
       },
       "locked": {
-        "lastModified": 1759902829,
+        "lastModified": 1765345660,
         "owner": "bobvanderlinden",
         "repo": "nixpkgs-ruby",
-        "rev": "5fba6c022a63f1e76dee4da71edddad8959f088a",
+        "rev": "35e7b0919db2859f4022d1aa00b9c01253be1bce",
         "type": "github"
       },
       "original": {

data/devenv.nix CHANGED Viewed

@@ -7,7 +7,9 @@
     bundler.enable = false;
   };
-  pre-commit.hooks = {
+  git-hooks.hooks = {
     conform.enable = true;
   };
+  packages = with pkgs; [ git-cliff ];
 }

data/lib/gtfs_df/base_gtfs_table.rb CHANGED Viewed

@@ -13,8 +13,9 @@ module GtfsDf
           # TODO: use `infer_schema: false` instead of `infer_schema_length` after polars release:
           # https://github.com/ankane/ruby-polars/blob/master/CHANGELOG.md#100-unreleased
           df = Polars.read_csv(input, infer_schema_length: 0)
-          dtypes = self.class::SCHEMA.slice(*df.columns)
+            .rename(->(col) { col.strip })
+          dtypes = self.class::SCHEMA.slice(*df.columns)
           df
             .with_columns(dtypes.keys.map do |col|
               stripped = Polars.col(col).str.strip

data/lib/gtfs_df/feed.rb CHANGED Viewed

@@ -157,59 +157,76 @@ module GtfsDf
     # Traverses the grah to prune unreferenced entities from child dataframes
     # based on parent relationships. See GtfsDf::Graph::STOP_NODES
     def prune!(root, filtered, filter_only_children: false)
+      seen_edges = Set.new
       maybe_digraph = filter_only_children ? graph : graph.to_undirected
-      maybe_digraph.each_bfs_edge(root) do |parent_node_id, child_node_id|
-        parent_node = Graph::NODES[parent_node_id]
-        child_node = Graph::NODES[child_node_id]
-        parent_df = filtered[parent_node.fetch(:file)]
-        next unless parent_df
-        child_df = filtered[child_node.fetch(:file)]
-        # Certain nodes are pre-filtered because they reference only
-        # a piece of the dataframe
-        filter_attrs = child_node[:filter_attrs]
-        if filter_attrs && child_df.columns.include?(filter_attrs.fetch(:filter_col))
-          filter = filter_attrs.fetch(:filter)
-          # Temporarily remove rows that do not match node filter criteria to process them
-          # separately (e.g., when filtering stops, parent stations that should be preserved
-          # regardless of direct references)
-          saved_vals = child_df.filter(filter.is_not)
-          child_df = child_df.filter(filter)
-        end
-        next unless child_df && child_df.height > 0
-        attrs = maybe_digraph.get_edge_data(parent_node_id, child_node_id)
-        attrs[:dependencies].each do |dep|
-          parent_col = dep[parent_node_id]
-          child_col = dep[child_node_id]
-          allow_null = !!dep[:allow_null]
-          next unless parent_col && child_col &&
-            parent_df.columns.include?(parent_col) && child_df.columns.include?(child_col)
-          # Get valid values from parent
-          valid_values = parent_df[parent_col].to_a.uniq.compact
-          # Filter child to only include rows that reference valid parent values
-          before = child_df.height
-          filter = Polars.col(child_col).is_in(valid_values)
-          if allow_null
-            filter = (filter | Polars.col(child_col).is_null)
+      queue = [root]
+      while queue.length > 0
+        parent_node_id = queue.shift
+        maybe_digraph.adj[parent_node_id].each do |child_node_id, attrs|
+          edge = edge_id(parent_node_id, child_node_id)
+          next if seen_edges.include?(edge)
+          seen_edges.add(edge)
+          parent_node = Graph::NODES[parent_node_id]
+          child_node = Graph::NODES[child_node_id]
+          parent_df = filtered[parent_node.fetch(:file)]
+          next unless parent_df
+          child_df = filtered[child_node.fetch(:file)]
+          # Certain nodes are pre-filtered because they reference only
+          # a piece of the dataframe
+          filter_attrs = child_node[:filter_attrs]
+          if filter_attrs && child_df.columns.include?(filter_attrs.fetch(:filter_col))
+            filter = filter_attrs.fetch(:filter)
+            # Temporarily remove rows that do not match node filter criteria to process them
+            # separately (e.g., when filtering stops, parent stations that should be preserved
+            # regardless of direct references)
+            saved_vals = child_df.filter(filter.is_not)
+            child_df = child_df.filter(filter)
           end
-          child_df = child_df.filter(filter)
-          changed = child_df.height < before
-          # If we removed a part of the child_df earlier, concat it back on
-          if saved_vals
-            child_df = Polars.concat([child_df, saved_vals], how: "vertical")
-          end
-          if changed
-            filtered[child_node.fetch(:file)] = child_df
+          next unless child_df && child_df.height > 0
+          queue << child_node_id
+          attrs[:dependencies].each do |dep|
+            parent_col = dep[parent_node_id]
+            child_col = dep[child_node_id]
+            allow_null = !!dep[:allow_null]
+            next unless parent_col && child_col &&
+              parent_df.columns.include?(parent_col) && child_df.columns.include?(child_col)
+            # Get valid values from parent
+            valid_values = parent_df[parent_col].to_a.uniq.compact
+            # Filter child to only include rows that reference valid parent values
+            before = child_df.height
+            filter = Polars.col(child_col).is_in(valid_values)
+            if allow_null
+              filter = (filter | Polars.col(child_col).is_null)
+            end
+            child_df = child_df.filter(filter)
+            changed = child_df.height < before
+            # If we removed a part of the child_df earlier, concat it back on
+            if saved_vals
+              child_df = Polars.concat([child_df, saved_vals], how: "vertical")
+            end
+            if changed
+              filtered[child_node.fetch(:file)] = child_df
+            end
           end
         end
       end
     end
+    def edge_id(parent, child)
+      # Alphabetize to make sure this works with undirected graph
+      [parent, child].sort.join("-")
+    end
   end
 end

data/lib/gtfs_df/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module GtfsDf
-  VERSION = "0.5.0"
+  VERSION = "0.6.1"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: gtfs_df
 version: !ruby/object:Gem::Version
-  version: 0.5.0
+  version: 0.6.1
 platform: ruby
 authors:
 - David Mejorado
@@ -71,6 +71,7 @@ files:
 - LICENSE.txt
 - README.md
 - Rakefile
+- cliff.toml
 - devenv.lock
 - devenv.nix
 - devenv.yaml