RubyGems - find-subscriptions - Versions diffs - 0.1.0 - Mend

find-subscriptions 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +7 -0
data/LICENSE +124 -0
data/README.md +128 -0
data/bin/find-subscriptions +8 -0
data/lib/detectors/known_payees.rb +41 -0
data/lib/detectors/repeat_charges.rb +69 -0
data/lib/find_subscriptions/cli.rb +231 -0
data/lib/find_subscriptions/payee_normalizer.rb +81 -0
data/lib/find_subscriptions/schema_registry.rb +72 -0
data/lib/find_subscriptions/transaction.rb +15 -0
data/lib/find_subscriptions.rb +4 -0
data/lib/output/csv_reporter.rb +26 -0
data/lib/output/json_reporter.rb +27 -0
data/lib/output/stdout_reporter.rb +37 -0
data/lib/schemas/american_express.rb +31 -0
data/lib/schemas/generic.rb +35 -0
data/lib/schemas/navy_federal.rb +36 -0
metadata +118 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: '069716e44081191145ee5737091b013be74948f27e263e959c4fdb27b26da198'
+  data.tar.gz: b9a5581e027c13554712011b9f967aa504ec9c5ef5609fb172ca8bdc2abc97bf
+SHA512:
+  metadata.gz: 791c068bc914f3d7a5ae7ba06bf1f659f921aa9cc0a367ac2002f35ee52fe7182e47a7c84cd71ffa3f9fa0e7c0d9ed93f0ba995636becb1ad39a05484f81e7e6
+  data.tar.gz: c66594b54e5241d4660d0bd3d433581bfaf7fda6d16700b397caf39b193e73d74bf0db61bb7f16991a51fa47dfed4d49c7b56f5c0cf238087b2d58ea73634efd

data/LICENSE ADDED Viewed

@@ -0,0 +1,124 @@
+PolyForm Noncommercial License 1.0.0
+<https://polyformproject.org/licenses/noncommercial/1.0.0>
+Acceptance
+In order to get any license under these terms, you must agree
+to them as both strict obligations and conditions to all your
+licenses.
+Copyright License
+The licensor grants you a copyright license for the software to
+do everything you might do with the software that would otherwise
+infringe the licensor's copyright in it for any permitted
+purpose. However, you may only distribute the software according
+to Distribution License and make changes or new works based on
+the software according to Changes and New Works License.
+Distribution License
+The licensor grants you an additional copyright license to
+distribute copies of the software. Your license to distribute
+covers distributing the software with changes and new works
+permitted by Changes and New Works License.
+How to Distribute
+You must ensure that anyone who gets a copy of any part of the
+software from you also gets a copy of these terms or the URL for
+them above, as well as copies of any plain-text lines beginning
+with Required Notice: that the licensor provided with the
+software. For example:
+Required Notice: Copyright Jeffrey Baird (https://github.com/jeffreybaird/find-subscriptions)
+Changes and New Works License
+The licensor grants you an additional copyright license to make
+changes and new works based on the software for any permitted
+purpose.
+Patent License
+The licensor grants you a patent license for the software that
+covers patent claims the licensor can license, or will be able to
+license, that you would infringe by using the software.
+Noncommercial Purposes
+Any noncommercial purpose is a permitted purpose.
+Personal Uses
+Personal use for research, experiment, and testing for the
+benefit of public knowledge, personal study, private
+entertainment, hobby projects, amateur pursuits, or religious
+observance, without any anticipated commercial application, is
+use for a permitted purpose.
+Noncommercial Organizations
+Use by any charitable organization, educational institution,
+public research organization, public safety or health
+organization, environmental protection organization, or
+government institution is use for a permitted purpose regardless
+of the source of funding or obligations resulting from the
+funding.
+Fair Use
+You may have "fair use" rights for the software under the law.
+These terms do not limit them.
+No Other Rights
+These terms do not allow you to sublicense or transfer any of
+your licenses to anyone else, or prevent the licensor from
+granting licenses to anyone else. These terms do not imply any
+other licenses.
+Patent Defense
+If you make any written claim that the software infringes or
+contributes to infringement of any patent, your patent license
+for the software granted under these terms ends immediately. If
+your employer makes such a claim, your patent license ends
+immediately for work on behalf of your employer.
+Violations
+The first time you are notified in writing that you have violated
+any of these terms, or done anything with the software not
+covered by your licenses, you have 30 days to come into
+compliance. If you do not do so, your licenses end immediately.
+No Liability
+As far as the law allows, the software comes as is, without any
+warranty or condition, and the licensor will not be liable to you
+for any damages arising out of these terms or the use or nature
+of the software, under any kind of legal claim.
+Definitions
+The licensor is the individual or entity offering these terms,
+and the software is the software the licensor makes available
+under these terms.
+You refers to the individual or entity agreeing to these terms.
+Your company is any legal entity, sole proprietorship, or other
+kind of organization that you work for, plus all organizations
+that have control over, are under the control of, or are under
+common control with that organization. Control means ownership
+of substantially all the assets of an entity, or the power to
+direct its management and policies by vote, contract, or
+otherwise. Control can be direct or indirect.
+Your licenses are all the licenses granted to you for the
+software under these terms.
+Use means anything you do with the software requiring one of
+your licenses.

data/README.md ADDED Viewed

@@ -0,0 +1,128 @@
+# find-subscriptions
+Scans bank and credit card CSV exports to surface recurring charges — subscriptions, memberships, and other repeat payments you may have forgotten about.
+## Requirements
+- Ruby 3.x
+## Usage
+```
+./bin/find-subscriptions --files EXPORT.csv [options]
+```
+### Options
+| Flag | Description |
+|------|-------------|
+| `--files FILES` | Comma-separated list of CSV files to scan |
+| `--schema NAME` | Force a schema instead of auto-detecting from headers |
+| `--known-payees PATH` | Path to a known-payees YAML file; matching payees are **filtered out** of results |
+| `--sort ORDER` | Sort order for results (see below) |
+| `--inactive-for DURATION` | Hide subscriptions with no recent transactions (see below) |
+| `--min-amount AMOUNT` | Hide subscriptions with a recurring charge below AMOUNT (e.g. `5.00`) |
+| `--from DATE` | Only include transactions on or after DATE (`YYYY-MM-DD`) |
+| `--to DATE` | Only include transactions on or before DATE (`YYYY-MM-DD`) |
+| `--format FORMAT` | Output format: `text` (default), `json`, `csv` |
+### Output formats
+| Value | Output |
+|-------|--------|
+| `text` | Human-readable table *(default)* |
+| `json` | Pretty-printed JSON array — pipe into `jq` or save for further processing |
+| `csv` | CSV with header row — open in a spreadsheet or feed into other scripts |
+```
+./bin/find-subscriptions --files export.csv --format json
+./bin/find-subscriptions --files export.csv --format csv > subscriptions.csv
+```
+### Sort orders
+| Value | Meaning |
+|-------|---------|
+| `first_desc` | First charge date, newest first *(default)* |
+| `first_asc` | First charge date, oldest first |
+| `last_desc` | Most-recent charge, newest first |
+| `last_asc` | Most-recent charge, oldest first |
+| `count_desc` | Number of transactions, highest first |
+| `count_asc` | Number of transactions, lowest first |
+### `--inactive-for` duration format
+A number followed by `year`, `month`, or `week` (plurals accepted):
+```
+--inactive-for 6months
+--inactive-for 1year
+--inactive-for 3weeks
+```
+Subscriptions whose last transaction is older than the duration are hidden. Useful for trimming results to only currently-active charges.
+## Examples
+Scan a single Amex export, auto-detecting the schema:
+```
+./bin/find-subscriptions --files Amex-2025.csv
+```
+Scan multiple files and force a schema:
+```
+./bin/find-subscriptions --files jan.csv,feb.csv --schema american_express
+```
+Filter out known/expected subscriptions and show only recent ones:
+```
+./bin/find-subscriptions --files Amex-2025.csv \
+  --known-payees data/known_payees.yml \
+  --inactive-for 6months \
+  --sort last_desc
+```
+## Supported schemas
+| Name | Bank / Issuer | Required CSV headers |
+|------|---------------|----------------------|
+| `american_express` | American Express | `Date`, `Description`, `Amount` |
+| `navy_federal` | Navy Federal Credit Union | `Transaction Date`, `Description`, `Amount`, `Credit Debit Indicator` |
+| `generic` | Generic (YYYY-MM-DD dates) | `Date`, `Description`, `Amount` |
+The schema is auto-detected from the CSV headers. Pass `--schema NAME` to override.
+## Known-payees file
+The `--known-payees` flag points to a YAML file that maps canonical names to regex patterns. Any subscription whose payee matches a pattern is **removed from output** — useful for filtering charges you already know about.
+```yaml
+- name: "Netflix"
+  normalized: "netflix"
+  patterns:
+    - '/\bnetflix\b/i'
+    - '/\bnflx\b/i'
+- name: "Amazon Web Services"
+  normalized: "amazon web services"
+  patterns:
+    - '/aws\.amazon\.com/i'
+```
+Each entry requires:
+- `name` — human-readable label shown in output when the payee matches
+- `normalized` — internal deduplication key (lowercase, used for grouping)
+- `patterns` — list of Ruby regex literals in `/pattern/flags` format
+`data/known_payees.yml` is the default file and is always loaded for payee normalization (display names). Filtering only applies when `--known-payees` is explicitly passed.
+## Output format
+```
+Subscriptions:
+  - SPOTIFY                                                               : $9.99 since January 2025 (14 transactions) until February 2026
+  - NETFLIX.COM                                                           : $15.49 since March 2024 (12 transactions) until February 2026
+```

data/bin/find-subscriptions ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+$LOAD_PATH.unshift(File.expand_path('../lib', __dir__))
+require 'find_subscriptions/cli'
+FindSubscriptions::CLI.run(ARGV)

data/lib/detectors/known_payees.rb ADDED Viewed

@@ -0,0 +1,41 @@
+# frozen_string_literal: true
+require 'yaml'
+module FindSubscriptions
+  # Subscription detection strategies (repeat charges, known payees, etc.).
+  module Detectors
+    # Matches transactions to known subscription payees via configurable name matchers.
+    class KnownPayees
+      def initialize(known_payees:)
+        @known_payees = known_payees # hash: canonical_name => [matchers]
+      end
+      def detect(transactions)
+        # returns hash: canonical_name => earliest_transaction_date
+        found = {}
+        transactions.each do |tx|
+          canonical = match_payee(tx.payee)
+          next unless canonical
+          found[canonical] = tx.date if !found.key?(canonical) || tx.date < found[canonical]
+        end
+        found
+      end
+      private
+      def match_payee(payee)
+        normalized = payee.to_s.downcase
+        @known_payees.each do |canonical, matchers|
+          matchers.each do |m|
+            return canonical if normalized.include?(m.downcase)
+          end
+        end
+        nil
+      end
+    end
+  end
+end

data/lib/detectors/repeat_charges.rb ADDED Viewed

@@ -0,0 +1,69 @@
+# frozen_string_literal: true
+module FindSubscriptions
+  module Detectors
+    # Detects recurring charges by grouping outgoing transactions by normalized payee and amount.
+    class RepeatCharges
+      # rubocop:disable Lint/StructNewOverride
+      Candidate = Struct.new(
+        :payee_key,     # normalized payee key
+        :display_payee, # best-effort human label (raw payee)
+        :amount,        # BigDecimal
+        :since,         # Date
+        :until,         # Date
+        :count,         # Integer
+        :dates,         # [Date]
+        keyword_init: true
+      )
+      # rubocop:enable Lint/StructNewOverride
+      def initialize(payee_normalizer:, min_occurrences: 2, min_month_gap_days: 0, max_month_gap_days: 1000)
+        @payee_normalizer = payee_normalizer
+        @min_occurrences = min_occurrences
+        @min_gap = min_month_gap_days
+        @max_gap = max_month_gap_days
+      end
+      def detect(transactions)
+        outgoing = transactions.select { |t| t.amount&.positive? }
+        groups = outgoing.group_by { |t| [@payee_normalizer.normalize(t.payee), t.amount] }
+        groups.filter_map { |(payee_key, amount), txs| build_candidate(payee_key, amount, txs) }
+      end
+      private
+      def build_candidate(payee_key, amount, txs)
+        dates = txs.map(&:date).compact.sort
+        return unless dates.size >= @min_occurrences && recurring_monthlyish?(dates)
+        Candidate.new(payee_key: payee_key, display_payee: display_for(txs), amount: amount,
+                      since: dates.first, until: dates.last, count: dates.size, dates: dates)
+      end
+      def display_for(txs)
+        @payee_normalizer.display_name(txs.first.payee) || best_display_payee(txs)
+      end
+      def normalize_payee(payee)
+        payee.to_s.downcase
+             .gsub(/[^a-z0-9\s]/, ' ')
+             .gsub(/\s+/, ' ')
+             .strip
+      end
+      def best_display_payee(txs)
+        counts = Hash.new(0)
+        txs.each { |t| counts[t.payee.to_s.strip] += 1 }
+        counts.max_by { |_k, v| v }&.first || txs.first.payee.to_s
+      end
+      def recurring_monthlyish?(dates)
+        return false if dates.size < @min_occurrences
+        gaps = dates.each_cons(2).map { |a, b| (b - a).to_i }
+        monthlyish = gaps.count { |d| d.between?(@min_gap, @max_gap) }
+        monthlyish >= [1, (gaps.size * 0.6).ceil].max
+      end
+    end
+  end
+end

data/lib/find_subscriptions/cli.rb ADDED Viewed

@@ -0,0 +1,231 @@
+# frozen_string_literal: true
+require 'optparse'
+require 'csv'
+require 'set'
+require 'yaml'
+require 'bigdecimal'
+require 'date'
+require_relative 'transaction'
+require_relative 'schema_registry'
+require_relative '../schemas/generic'
+require_relative '../schemas/american_express'
+require_relative '../schemas/navy_federal'
+require_relative '../detectors/known_payees'
+require_relative '../detectors/repeat_charges'
+require_relative '../output/stdout_reporter'
+require_relative '../output/json_reporter'
+require_relative '../output/csv_reporter'
+require_relative '../find_subscriptions/payee_normalizer'
+module FindSubscriptions
+  # Command-line interface: parses options, loads CSVs, runs detectors, and reports subscriptions.
+  class CLI
+    SORT_PROCS = {
+      'count_asc' => ->(subs) { subs.sort_by { |sub| sub[:count] } },
+      'count_desc' => ->(subs) { subs.sort_by { |sub| -sub[:count] } },
+      'first_asc' => ->(subs) { subs.sort_by { |sub| sub[:since] } },
+      'first_desc' => ->(subs) { subs.sort_by { |sub| sub[:since] }.reverse },
+      'last_asc' => ->(subs) { subs.sort_by { |sub| sub[:until] } },
+      'last_desc' => ->(subs) { subs.sort_by { |sub| sub[:until] }.reverse }
+    }.freeze
+    VALID_SORT_ORDERS = SORT_PROCS.keys.freeze
+    def self.run(argv)
+      new.run(argv)
+    end
+    def run(argv)
+      options = parse_options(argv)
+      files = options.fetch(:files)
+      raise ArgumentError, 'No files provided' if files.empty?
+      registry = build_registry
+      transactions = load_transactions(files, registry, options[:schema])
+      transactions = filter_by_date_range(transactions, options[:from_date], options[:to_date])
+      payee_normalizer = PayeeNormalizer.from_yaml(options[:known_payees_path])
+      subscriptions = detect_subscriptions(transactions, payee_normalizer, options)
+      reporter_for(options[:format]).print(subscriptions)
+    end
+    private
+    def detect_subscriptions(transactions, payee_normalizer, options)
+      detector = Detectors::RepeatCharges.new(payee_normalizer: payee_normalizer, min_occurrences: 2)
+      candidates = detector.detect(transactions)
+      candidates = filter_known(candidates, payee_normalizer) if options[:filter_known_payees]
+      candidates = filter_by_min_amount(candidates, options[:min_amount]) if options[:min_amount]
+      subscriptions = candidates.map { |candidate| candidate_to_hash(candidate) }
+      subscriptions = filter_inactive(subscriptions, options[:inactive_for]) if options[:inactive_for]
+      sort_subscriptions(subscriptions, options[:sort])
+    end
+    INACTIVE_FOR_PATTERN = /\A(\d+)\s*(year|month|week)s?\z/i.freeze
+    def filter_inactive(subscriptions, inactive_for, today: Date.today)
+      count, unit = parse_inactive_for(inactive_for)
+      cutoff = inactive_cutoff(count, unit.downcase, today)
+      subscriptions.select { |sub| sub[:until] >= cutoff }
+    end
+    def parse_inactive_for(value)
+      match = value.to_s.match(INACTIVE_FOR_PATTERN)
+      unless match
+        raise ArgumentError,
+              "Invalid --inactive-for value: #{value.inspect}. Expected format: NUMBER(year|month|week)[s]"
+      end
+      [match[1].to_i, match[2]]
+    end
+    def inactive_cutoff(count, unit, today)
+      case unit
+      when 'year'  then today << (count * 12)
+      when 'month' then today << count
+      when 'week'  then today - (count * 7)
+      end
+    end
+    def reporter_for(format)
+      case format
+      when 'json' then Output::JsonReporter.new
+      when 'csv'  then Output::CsvReporter.new
+      else             Output::StdoutReporter.new
+      end
+    end
+    def filter_by_date_range(transactions, from_date, to_date)
+      transactions.select do |txn|
+        (from_date.nil? || txn.date >= from_date) &&
+          (to_date.nil? || txn.date <= to_date)
+      end
+    end
+    def filter_by_min_amount(candidates, min_amount)
+      threshold = BigDecimal(min_amount.to_s)
+      candidates.select { |candidate| candidate.amount >= threshold }
+    end
+    def filter_known(candidates, payee_normalizer)
+      candidates.reject { |candidate| payee_normalizer.known_payee_key?(candidate.payee_key) }
+    end
+    def candidate_to_hash(candidate)
+      {
+        name: candidate.display_payee,
+        amount: format_money(candidate.amount),
+        since: candidate.since,
+        until: candidate.until,
+        count: candidate.count
+      }
+    end
+    def sort_subscriptions(subscriptions, sort_order)
+      sorter = SORT_PROCS[sort_order]
+      unless sorter
+        raise ArgumentError, "Invalid sort order: #{sort_order}. Valid options: #{VALID_SORT_ORDERS.join(', ')}"
+      end
+      sorter.call(subscriptions)
+    end
+    def format_money(decimal)
+      format('%.2f', decimal.to_f)
+    end
+    def parse_options(argv)
+      options = default_options
+      define_option_parser(options).parse!(argv)
+      options
+    end
+    DEFAULT_KNOWN_PAYEES_PATH = File.expand_path('../../data/known_payees.yml', __dir__).freeze
+    def default_options
+      {
+        files: [], schema: nil,
+        sort: 'first_desc', format: 'text',
+        min_amount: nil, from_date: nil, to_date: nil,
+        filter_known_payees: false,
+        known_payees_path: DEFAULT_KNOWN_PAYEES_PATH
+      }
+    end
+    def define_option_parser(options) # rubocop:disable Metrics/MethodLength
+      OptionParser.new do |opt|
+        opt.banner = 'Usage: find-subscriptions --files a.csv,b.csv [--schema NAME]'
+        opt.on('--files FILES', 'Comma-separated list of CSV files') do |val|
+          options[:files] = val.split(',').map(&:strip).reject(&:empty?)
+        end
+        opt.on('--schema NAME', 'Force schema name (otherwise auto-detect)') do |val|
+          options[:schema] = val.strip
+        end
+        opt.on('--known-payees PATH', 'Known payees YAML; matched payees are filtered from output') do |val|
+          options[:known_payees_path] = val
+          options[:filter_known_payees] = true
+        end
+        opt.on('--inactive-for DURATION',
+               'Hide subscriptions with no transactions in DURATION (e.g. 6months, 1year, 3weeks)') do |val|
+          options[:inactive_for] = val.strip
+        end
+        opt.on('--min-amount AMOUNT', 'Hide subscriptions with a recurring charge below AMOUNT') do |val|
+          options[:min_amount] = val.strip
+        end
+        register_date_range_options(opt, options)
+        register_presentation_options(opt, options)
+      end
+    end
+    def register_date_range_options(opt, options)
+      opt.on('--from DATE', 'Only include transactions on or after DATE (YYYY-MM-DD)') do |val|
+        options[:from_date] = Date.parse(val)
+      end
+      opt.on('--to DATE', 'Only include transactions on or before DATE (YYYY-MM-DD)') do |val|
+        options[:to_date] = Date.parse(val)
+      end
+    end
+    def register_presentation_options(opt, options)
+      opt.on('--sort ORDER', "Sort order: #{VALID_SORT_ORDERS.join(', ')} (default: first_desc)") do |val|
+        options[:sort] = val.strip
+      end
+      opt.on('--format FORMAT', 'Output format: text (default), json, csv') do |val|
+        options[:format] = val.strip
+      end
+    end
+    def build_registry
+      registry = SchemaRegistry.new
+      registry.register('american_express', Schemas.american_express)
+      registry.register('navy_federal', Schemas.navy_federal)
+      registry.register('generic', Schemas.generic)
+      registry
+    end
+    def load_transactions(files, registry, forced_schema_name)
+      files.flat_map do |path|
+        raise ArgumentError, "File not found: #{path}" unless File.exist?(path)
+        csv = CSV.read(path, headers: true)
+        schema = fetch_schema(path, csv, registry, forced_schema_name)
+        csv.map { |row| schema.map_row(row.to_h) }
+      end
+    end
+    def fetch_schema(path, csv, registry, forced_schema_name)
+      if forced_schema_name
+        registry.fetch(forced_schema_name)
+      else
+        detected = registry.detect_for(csv.headers)
+        raise ArgumentError, "Could not detect schema for #{File.basename(path)}. Use --schema." unless detected
+        detected
+      end
+    end
+  end
+end

data/lib/find_subscriptions/payee_normalizer.rb ADDED Viewed

@@ -0,0 +1,81 @@
+# frozen_string_literal: true
+require 'yaml'
+module FindSubscriptions
+  # Normalizes and displays payee names using YAML-defined rules (regex patterns and canonical names).
+  class PayeeNormalizer
+    attr_reader :rules
+    Rule = Struct.new(:name, :normalized, :regexes, keyword_init: true)
+    def initialize(rules: [])
+      @rules = rules
+    end
+    def normalize(raw_payee)
+      text = raw_payee.to_s
+      rule = @rules.find { |r| r.regexes.any? { |re| re.match?(text) } }
+      return rule.normalized if rule
+      # fallback: generic normalization (same idea you used before)
+      fallback_normalize(text)
+    end
+    def display_name(raw_payee)
+      text = raw_payee.to_s
+      rule = @rules.find { |r| r.regexes.any? { |re| re.match?(text) } }
+      rule&.name
+    end
+    def known_payee_key?(normalized_key)
+      @rules.any? { |r| r.normalized == normalized_key }
+    end
+    def self.from_yaml(path)
+      return new(rules: []) unless path && File.exist?(path)
+      data = YAML.load_file(path)
+      raise ArgumentError, 'known payees YAML must be an array of rules' unless data.is_a?(Array)
+      new(rules: data.map { |h| build_rule(h) })
+    end
+    def self.build_rule(rule_hash)
+      unless rule_hash.is_a?(Hash) && rule_hash['normalized'] && rule_hash['patterns']
+        raise ArgumentError, "Each rule needs 'normalized' and 'patterns'"
+      end
+      Rule.new(
+        name: rule_hash['name'],
+        normalized: rule_hash['normalized'].to_s,
+        regexes: Array(rule_hash['patterns']).map { |p| parse_regex(p) }
+      )
+    end
+    def self.parse_regex(str)
+      s = str.to_s.strip
+      unless s.start_with?('/') && s.count('/') >= 2
+        raise ArgumentError, "Invalid regex string: #{str.inspect} (expected like \"/foo/i\")"
+      end
+      last_slash = s.rindex('/')
+      Regexp.new(s[1...last_slash], regex_flags(s[(last_slash + 1)..]))
+    end
+    def self.regex_flags(flags)
+      opts = 0
+      opts |= Regexp::IGNORECASE if flags&.include?('i')
+      opts |= Regexp::MULTILINE if flags&.include?('m')
+      opts |= Regexp::EXTENDED if flags&.include?('x')
+      opts
+    end
+    def fallback_normalize(payee)
+      payee.downcase
+           .gsub(/[^a-z0-9\s+]/, ' ') # keep + since you use P+
+           .gsub(/\s+/, ' ')
+           .strip
+    end
+  end
+end

data/lib/find_subscriptions/schema_registry.rb ADDED Viewed

@@ -0,0 +1,72 @@
+# frozen_string_literal: true
+require 'csv'
+require 'bigdecimal'
+require 'date'
+module FindSubscriptions
+  # Registry of CSV schemas; supports registration, lookup by name, and auto-detection from headers.
+  class SchemaRegistry
+    attr_reader :schemas
+    def initialize
+      @schemas = {}
+    end
+    def register(name, schema)
+      @schemas[name] = schema
+    end
+    def fetch(name)
+      @schemas.fetch(name) { raise ArgumentError, "Unknown schema: #{name}" }
+    end
+    # Tries each schema until one matches the CSV headers.
+    def detect_for(csv_headers)
+      @schemas.each_value do |schema|
+        return schema if schema.matches_headers?(csv_headers)
+      end
+      nil
+    end
+  end
+  # Defines how to parse a CSV: required headers, amount column, debit/credit direction, and row-to-Transaction mapping.
+  class CsvSchema
+    attr_reader :required_headers, :amount_key, :direction, :mapping
+    # direction: lambda(row_hash, amount_bd) => :debit or :credit
+    # mapping: lambda(row_hash, signed_amount_bd) => Transaction
+    def initialize(required_headers:, amount_key:, direction:, mapping:)
+      @required_headers = required_headers.map(&:strip).to_set
+      @amount_key = amount_key
+      @direction = direction
+      @mapping = mapping
+    end
+    def matches_headers?(headers)
+      headers_set = headers.map(&:strip).to_set
+      @required_headers.subset?(headers_set)
+    end
+    def map_row(row_hash)
+      raw_amount = row_hash.fetch(@amount_key).to_f
+      dir = @direction.call(row_hash, raw_amount)
+      signed =
+        case dir
+        when :debit  then raw_amount.abs     # outgoing positive
+        when :credit then -raw_amount.abs    # incoming/refund negative
+        else
+          raise ArgumentError, 'direction must be :debit or :credit'
+        end
+      @mapping.call(row_hash, signed)
+    end
+    def ==(other)
+      other.is_a?(CsvSchema) &&
+        @required_headers == other.required_headers &&
+        @amount_key == other.amount_key
+    end
+  end
+end

data/lib/find_subscriptions/transaction.rb ADDED Viewed

@@ -0,0 +1,15 @@
+# frozen_string_literal: true
+require 'date'
+# Top-level namespace for subscription detection from bank/credit CSV exports.
+module FindSubscriptions
+  # A single parsed transaction: date, payee, signed amount (positive = outgoing), and raw row.
+  Transaction = Struct.new(
+    :date,        # Date
+    :payee,       # String
+    :amount,      # BigDecimal (positive = outgoing)
+    :raw,         # Hash original row
+    keyword_init: true
+  )
+end

data/lib/find_subscriptions.rb ADDED Viewed

@@ -0,0 +1,4 @@
+# frozen_string_literal: true
+# Entry point for FindSubscriptions: subscription detection from bank/credit CSV exports.
+require_relative 'find_subscriptions/transaction'

data/lib/output/csv_reporter.rb ADDED Viewed

@@ -0,0 +1,26 @@
+# frozen_string_literal: true
+require 'csv'
+module FindSubscriptions
+  module Output
+    # Outputs subscriptions as CSV with a header row.
+    class CsvReporter
+      HEADERS = %w[name amount since until count].freeze
+      def initialize(io: $stdout)
+        @io = io
+      end
+      def print(subscriptions)
+        output = CSV.generate do |csv|
+          csv << HEADERS
+          subscriptions.each do |sub|
+            csv << [sub[:name], sub[:amount], sub[:since].iso8601, sub[:until].iso8601, sub[:count]]
+          end
+        end
+        @io.print output
+      end
+    end
+  end
+end

data/lib/output/json_reporter.rb ADDED Viewed

@@ -0,0 +1,27 @@
+# frozen_string_literal: true
+require 'json'
+module FindSubscriptions
+  module Output
+    # Outputs subscriptions as a pretty-printed JSON array.
+    class JsonReporter
+      def initialize(io: $stdout)
+        @io = io
+      end
+      def print(subscriptions)
+        data = subscriptions.map do |sub|
+          {
+            name: sub[:name],
+            amount: sub[:amount],
+            since: sub[:since].iso8601,
+            until: sub[:until].iso8601,
+            count: sub[:count]
+          }
+        end
+        @io.puts JSON.pretty_generate(data)
+      end
+    end
+  end
+end

data/lib/output/stdout_reporter.rb ADDED Viewed

@@ -0,0 +1,37 @@
+# frozen_string_literal: true
+module FindSubscriptions
+  # Output formatters (stdout, future: JSON, etc.).
+  module Output
+    # Prints subscription candidates to an IO (default stdout) in a human-readable format.
+    class StdoutReporter
+      def initialize(io: $stdout)
+        @io = io
+      end
+      # subscriptions: array of hashes like:
+      # { name: "Netflix", amount: "14.99", since: Date }
+      def print(subscriptions)
+        @io.puts 'Subscriptions:'
+        if subscriptions.empty?
+          @io.puts '  - (none found)'
+          return
+        end
+        subscriptions.each { |s| @io.puts format_subscription(s) }
+      end
+      private
+      def format_subscription(sub)
+        tx_label = sub[:count] == 1 ? 'transaction' : 'transactions'
+        "  - #{format_name(sub[:name])}: $#{sub[:amount]} since #{sub[:since].strftime('%B %Y')}" \
+          " (#{sub[:count]} #{tx_label}) until #{sub[:until].strftime('%B %Y')}"
+      end
+      def format_name(name)
+        name.ljust(70)
+      end
+    end
+  end
+end

data/lib/schemas/american_express.rb ADDED Viewed

@@ -0,0 +1,31 @@
+# frozen_string_literal: true
+require 'bigdecimal'
+require 'date'
+module FindSubscriptions
+  # CSV schema definitions for supported banks and card issuers.
+  module Schemas
+    AMEX_DIRECTION = lambda { |_row, amount|
+      amount.negative? ? :credit : :debit
+    }.freeze
+    AMEX_MAPPING = lambda { |row, signed_amount|
+      Transaction.new(
+        date: Date.strptime(row['Date'], '%m/%d/%Y'),
+        payee: row.fetch('Description').to_s.strip,
+        amount: signed_amount,
+        raw: row
+      )
+    }.freeze
+    def self.american_express
+      CsvSchema.new(
+        required_headers: ['Date', 'Description', 'Amount', 'Card Member'],
+        amount_key: 'Amount',
+        direction: AMEX_DIRECTION,
+        mapping: AMEX_MAPPING
+      )
+    end
+  end
+end

data/lib/schemas/generic.rb ADDED Viewed

@@ -0,0 +1,35 @@
+# frozen_string_literal: true
+require 'set'
+require 'bigdecimal'
+require 'date'
+require_relative '../../lib/find_subscriptions/transaction'
+module FindSubscriptions
+  # CSV schema definitions for supported banks and card issuers.
+  module Schemas
+    DATE_FORMAT = '%Y-%m-%d'
+    def self.generic
+      CsvSchema.new(
+        required_headers: %w[Date Description Amount],
+        amount_key: 'Amount',
+        direction: lambda { |_row, amount|
+          amount.negative? ? :debit : :credit
+        },
+        mapping: lambda do |row, signed_amount|
+          map_row(row, signed_amount)
+        end
+      )
+    end
+    def self.map_row(row, signed_amount)
+      Transaction.new(
+        date: Date.strptime(row.fetch('Date'), DATE_FORMAT),
+        payee: row.fetch('Description').to_s.strip,
+        amount: signed_amount,
+        raw: row
+      )
+    end
+  end
+end

data/lib/schemas/navy_federal.rb ADDED Viewed

@@ -0,0 +1,36 @@
+# frozen_string_literal: true
+require 'bigdecimal'
+require 'date'
+module FindSubscriptions
+  # CSV schema definitions for supported banks and card issuers.
+  module Schemas
+    NAVY_FEDERAL_DIRECTION = lambda { |row, _amount|
+      type = row.fetch('Credit Debit Indicator').to_s.strip.downcase
+      case type
+      when 'debit' then :debit
+      when 'credit' then :credit
+      else raise ArgumentError, "Unknown Credit Debit Indicator: #{row['Credit Debit Indicator']}"
+      end
+    }.freeze
+    NAVY_FEDERAL_MAPPING = lambda { |row, signed_amount|
+      Transaction.new(
+        date: Date.strptime(row.fetch('Transaction Date'), '%m/%d/%Y'),
+        payee: row.fetch('Description').to_s.strip,
+        amount: signed_amount,
+        raw: row
+      )
+    }.freeze
+    def self.navy_federal
+      CsvSchema.new(
+        required_headers: ['Transaction Date', 'Description', 'Amount', 'Credit Debit Indicator'],
+        amount_key: 'Amount',
+        direction: NAVY_FEDERAL_DIRECTION,
+        mapping: NAVY_FEDERAL_MAPPING
+      )
+    end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,118 @@
+--- !ruby/object:Gem::Specification
+name: find-subscriptions
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Jeffrey Baird
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2026-03-05 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bigdecimal
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.12'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.12'
+- !ruby/object:Gem::Dependency
+  name: rubocop
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.50'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.50'
+description: |
+  A CLI tool that analyzes CSV files from banks and credit cards to detect
+  subscription charges based on known payees and recurring transaction patterns.
+email:
+- jeffreybaird@hey.com
+executables:
+- find-subscriptions
+extensions: []
+extra_rdoc_files: []
+files:
+- LICENSE
+- README.md
+- bin/find-subscriptions
+- lib/detectors/known_payees.rb
+- lib/detectors/repeat_charges.rb
+- lib/find_subscriptions.rb
+- lib/find_subscriptions/cli.rb
+- lib/find_subscriptions/payee_normalizer.rb
+- lib/find_subscriptions/schema_registry.rb
+- lib/find_subscriptions/transaction.rb
+- lib/output/csv_reporter.rb
+- lib/output/json_reporter.rb
+- lib/output/stdout_reporter.rb
+- lib/schemas/american_express.rb
+- lib/schemas/generic.rb
+- lib/schemas/navy_federal.rb
+homepage: https://github.com/jeffreybaird/find-subscriptions
+licenses:
+- PolyForm-Noncommercial-1.0.0
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '2.7'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.4.1
+signing_key:
+specification_version: 4
+summary: Find recurring subscription charges in bank and credit card CSV exports
+test_files: []