find-subscriptions 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: '069716e44081191145ee5737091b013be74948f27e263e959c4fdb27b26da198'
4
+ data.tar.gz: b9a5581e027c13554712011b9f967aa504ec9c5ef5609fb172ca8bdc2abc97bf
5
+ SHA512:
6
+ metadata.gz: 791c068bc914f3d7a5ae7ba06bf1f659f921aa9cc0a367ac2002f35ee52fe7182e47a7c84cd71ffa3f9fa0e7c0d9ed93f0ba995636becb1ad39a05484f81e7e6
7
+ data.tar.gz: c66594b54e5241d4660d0bd3d433581bfaf7fda6d16700b397caf39b193e73d74bf0db61bb7f16991a51fa47dfed4d49c7b56f5c0cf238087b2d58ea73634efd
data/LICENSE ADDED
@@ -0,0 +1,124 @@
1
+ PolyForm Noncommercial License 1.0.0
2
+
3
+ <https://polyformproject.org/licenses/noncommercial/1.0.0>
4
+
5
+ Acceptance
6
+
7
+ In order to get any license under these terms, you must agree
8
+ to them as both strict obligations and conditions to all your
9
+ licenses.
10
+
11
+ Copyright License
12
+
13
+ The licensor grants you a copyright license for the software to
14
+ do everything you might do with the software that would otherwise
15
+ infringe the licensor's copyright in it for any permitted
16
+ purpose. However, you may only distribute the software according
17
+ to Distribution License and make changes or new works based on
18
+ the software according to Changes and New Works License.
19
+
20
+ Distribution License
21
+
22
+ The licensor grants you an additional copyright license to
23
+ distribute copies of the software. Your license to distribute
24
+ covers distributing the software with changes and new works
25
+ permitted by Changes and New Works License.
26
+
27
+ How to Distribute
28
+
29
+ You must ensure that anyone who gets a copy of any part of the
30
+ software from you also gets a copy of these terms or the URL for
31
+ them above, as well as copies of any plain-text lines beginning
32
+ with Required Notice: that the licensor provided with the
33
+ software. For example:
34
+
35
+ Required Notice: Copyright Jeffrey Baird (https://github.com/jeffreybaird/find-subscriptions)
36
+
37
+ Changes and New Works License
38
+
39
+ The licensor grants you an additional copyright license to make
40
+ changes and new works based on the software for any permitted
41
+ purpose.
42
+
43
+ Patent License
44
+
45
+ The licensor grants you a patent license for the software that
46
+ covers patent claims the licensor can license, or will be able to
47
+ license, that you would infringe by using the software.
48
+
49
+ Noncommercial Purposes
50
+
51
+ Any noncommercial purpose is a permitted purpose.
52
+
53
+ Personal Uses
54
+
55
+ Personal use for research, experiment, and testing for the
56
+ benefit of public knowledge, personal study, private
57
+ entertainment, hobby projects, amateur pursuits, or religious
58
+ observance, without any anticipated commercial application, is
59
+ use for a permitted purpose.
60
+
61
+ Noncommercial Organizations
62
+
63
+ Use by any charitable organization, educational institution,
64
+ public research organization, public safety or health
65
+ organization, environmental protection organization, or
66
+ government institution is use for a permitted purpose regardless
67
+ of the source of funding or obligations resulting from the
68
+ funding.
69
+
70
+ Fair Use
71
+
72
+ You may have "fair use" rights for the software under the law.
73
+ These terms do not limit them.
74
+
75
+ No Other Rights
76
+
77
+ These terms do not allow you to sublicense or transfer any of
78
+ your licenses to anyone else, or prevent the licensor from
79
+ granting licenses to anyone else. These terms do not imply any
80
+ other licenses.
81
+
82
+ Patent Defense
83
+
84
+ If you make any written claim that the software infringes or
85
+ contributes to infringement of any patent, your patent license
86
+ for the software granted under these terms ends immediately. If
87
+ your employer makes such a claim, your patent license ends
88
+ immediately for work on behalf of your employer.
89
+
90
+ Violations
91
+
92
+ The first time you are notified in writing that you have violated
93
+ any of these terms, or done anything with the software not
94
+ covered by your licenses, you have 30 days to come into
95
+ compliance. If you do not do so, your licenses end immediately.
96
+
97
+ No Liability
98
+
99
+ As far as the law allows, the software comes as is, without any
100
+ warranty or condition, and the licensor will not be liable to you
101
+ for any damages arising out of these terms or the use or nature
102
+ of the software, under any kind of legal claim.
103
+
104
+ Definitions
105
+
106
+ The licensor is the individual or entity offering these terms,
107
+ and the software is the software the licensor makes available
108
+ under these terms.
109
+
110
+ You refers to the individual or entity agreeing to these terms.
111
+
112
+ Your company is any legal entity, sole proprietorship, or other
113
+ kind of organization that you work for, plus all organizations
114
+ that have control over, are under the control of, or are under
115
+ common control with that organization. Control means ownership
116
+ of substantially all the assets of an entity, or the power to
117
+ direct its management and policies by vote, contract, or
118
+ otherwise. Control can be direct or indirect.
119
+
120
+ Your licenses are all the licenses granted to you for the
121
+ software under these terms.
122
+
123
+ Use means anything you do with the software requiring one of
124
+ your licenses.
data/README.md ADDED
@@ -0,0 +1,128 @@
1
+ # find-subscriptions
2
+
3
+ Scans bank and credit card CSV exports to surface recurring charges — subscriptions, memberships, and other repeat payments you may have forgotten about.
4
+
5
+ ## Requirements
6
+
7
+ - Ruby 3.x
8
+
9
+ ## Usage
10
+
11
+ ```
12
+ ./bin/find-subscriptions --files EXPORT.csv [options]
13
+ ```
14
+
15
+ ### Options
16
+
17
+ | Flag | Description |
18
+ |------|-------------|
19
+ | `--files FILES` | Comma-separated list of CSV files to scan |
20
+ | `--schema NAME` | Force a schema instead of auto-detecting from headers |
21
+ | `--known-payees PATH` | Path to a known-payees YAML file; matching payees are **filtered out** of results |
22
+ | `--sort ORDER` | Sort order for results (see below) |
23
+ | `--inactive-for DURATION` | Hide subscriptions with no recent transactions (see below) |
24
+ | `--min-amount AMOUNT` | Hide subscriptions with a recurring charge below AMOUNT (e.g. `5.00`) |
25
+ | `--from DATE` | Only include transactions on or after DATE (`YYYY-MM-DD`) |
26
+ | `--to DATE` | Only include transactions on or before DATE (`YYYY-MM-DD`) |
27
+ | `--format FORMAT` | Output format: `text` (default), `json`, `csv` |
28
+
29
+ ### Output formats
30
+
31
+ | Value | Output |
32
+ |-------|--------|
33
+ | `text` | Human-readable table *(default)* |
34
+ | `json` | Pretty-printed JSON array — pipe into `jq` or save for further processing |
35
+ | `csv` | CSV with header row — open in a spreadsheet or feed into other scripts |
36
+
37
+ ```
38
+ ./bin/find-subscriptions --files export.csv --format json
39
+ ./bin/find-subscriptions --files export.csv --format csv > subscriptions.csv
40
+ ```
41
+
42
+ ### Sort orders
43
+
44
+ | Value | Meaning |
45
+ |-------|---------|
46
+ | `first_desc` | First charge date, newest first *(default)* |
47
+ | `first_asc` | First charge date, oldest first |
48
+ | `last_desc` | Most-recent charge, newest first |
49
+ | `last_asc` | Most-recent charge, oldest first |
50
+ | `count_desc` | Number of transactions, highest first |
51
+ | `count_asc` | Number of transactions, lowest first |
52
+
53
+ ### `--inactive-for` duration format
54
+
55
+ A number followed by `year`, `month`, or `week` (plurals accepted):
56
+
57
+ ```
58
+ --inactive-for 6months
59
+ --inactive-for 1year
60
+ --inactive-for 3weeks
61
+ ```
62
+
63
+ Subscriptions whose last transaction is older than the duration are hidden. Useful for trimming results to only currently-active charges.
64
+
65
+ ## Examples
66
+
67
+ Scan a single Amex export, auto-detecting the schema:
68
+
69
+ ```
70
+ ./bin/find-subscriptions --files Amex-2025.csv
71
+ ```
72
+
73
+ Scan multiple files and force a schema:
74
+
75
+ ```
76
+ ./bin/find-subscriptions --files jan.csv,feb.csv --schema american_express
77
+ ```
78
+
79
+ Filter out known/expected subscriptions and show only recent ones:
80
+
81
+ ```
82
+ ./bin/find-subscriptions --files Amex-2025.csv \
83
+ --known-payees data/known_payees.yml \
84
+ --inactive-for 6months \
85
+ --sort last_desc
86
+ ```
87
+
88
+ ## Supported schemas
89
+
90
+ | Name | Bank / Issuer | Required CSV headers |
91
+ |------|---------------|----------------------|
92
+ | `american_express` | American Express | `Date`, `Description`, `Amount` |
93
+ | `navy_federal` | Navy Federal Credit Union | `Transaction Date`, `Description`, `Amount`, `Credit Debit Indicator` |
94
+ | `generic` | Generic (YYYY-MM-DD dates) | `Date`, `Description`, `Amount` |
95
+
96
+ The schema is auto-detected from the CSV headers. Pass `--schema NAME` to override.
97
+
98
+ ## Known-payees file
99
+
100
+ The `--known-payees` flag points to a YAML file that maps canonical names to regex patterns. Any subscription whose payee matches a pattern is **removed from output** — useful for filtering charges you already know about.
101
+
102
+ ```yaml
103
+ - name: "Netflix"
104
+ normalized: "netflix"
105
+ patterns:
106
+ - '/\bnetflix\b/i'
107
+ - '/\bnflx\b/i'
108
+
109
+ - name: "Amazon Web Services"
110
+ normalized: "amazon web services"
111
+ patterns:
112
+ - '/aws\.amazon\.com/i'
113
+ ```
114
+
115
+ Each entry requires:
116
+ - `name` — human-readable label shown in output when the payee matches
117
+ - `normalized` — internal deduplication key (lowercase, used for grouping)
118
+ - `patterns` — list of Ruby regex literals in `/pattern/flags` format
119
+
120
+ `data/known_payees.yml` is the default file and is always loaded for payee normalization (display names). Filtering only applies when `--known-payees` is explicitly passed.
121
+
122
+ ## Output format
123
+
124
+ ```
125
+ Subscriptions:
126
+ - SPOTIFY : $9.99 since January 2025 (14 transactions) until February 2026
127
+ - NETFLIX.COM : $15.49 since March 2024 (12 transactions) until February 2026
128
+ ```
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ $LOAD_PATH.unshift(File.expand_path('../lib', __dir__))
5
+
6
+ require 'find_subscriptions/cli'
7
+
8
+ FindSubscriptions::CLI.run(ARGV)
@@ -0,0 +1,41 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'yaml'
4
+
5
+ module FindSubscriptions
6
+ # Subscription detection strategies (repeat charges, known payees, etc.).
7
+ module Detectors
8
+ # Matches transactions to known subscription payees via configurable name matchers.
9
+ class KnownPayees
10
+ def initialize(known_payees:)
11
+ @known_payees = known_payees # hash: canonical_name => [matchers]
12
+ end
13
+
14
+ def detect(transactions)
15
+ # returns hash: canonical_name => earliest_transaction_date
16
+ found = {}
17
+
18
+ transactions.each do |tx|
19
+ canonical = match_payee(tx.payee)
20
+ next unless canonical
21
+
22
+ found[canonical] = tx.date if !found.key?(canonical) || tx.date < found[canonical]
23
+ end
24
+
25
+ found
26
+ end
27
+
28
+ private
29
+
30
+ def match_payee(payee)
31
+ normalized = payee.to_s.downcase
32
+ @known_payees.each do |canonical, matchers|
33
+ matchers.each do |m|
34
+ return canonical if normalized.include?(m.downcase)
35
+ end
36
+ end
37
+ nil
38
+ end
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,69 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FindSubscriptions
4
+ module Detectors
5
+ # Detects recurring charges by grouping outgoing transactions by normalized payee and amount.
6
+ class RepeatCharges
7
+ # rubocop:disable Lint/StructNewOverride
8
+ Candidate = Struct.new(
9
+ :payee_key, # normalized payee key
10
+ :display_payee, # best-effort human label (raw payee)
11
+ :amount, # BigDecimal
12
+ :since, # Date
13
+ :until, # Date
14
+ :count, # Integer
15
+ :dates, # [Date]
16
+ keyword_init: true
17
+ )
18
+ # rubocop:enable Lint/StructNewOverride
19
+
20
+ def initialize(payee_normalizer:, min_occurrences: 2, min_month_gap_days: 0, max_month_gap_days: 1000)
21
+ @payee_normalizer = payee_normalizer
22
+ @min_occurrences = min_occurrences
23
+ @min_gap = min_month_gap_days
24
+ @max_gap = max_month_gap_days
25
+ end
26
+
27
+ def detect(transactions)
28
+ outgoing = transactions.select { |t| t.amount&.positive? }
29
+ groups = outgoing.group_by { |t| [@payee_normalizer.normalize(t.payee), t.amount] }
30
+ groups.filter_map { |(payee_key, amount), txs| build_candidate(payee_key, amount, txs) }
31
+ end
32
+
33
+ private
34
+
35
+ def build_candidate(payee_key, amount, txs)
36
+ dates = txs.map(&:date).compact.sort
37
+ return unless dates.size >= @min_occurrences && recurring_monthlyish?(dates)
38
+
39
+ Candidate.new(payee_key: payee_key, display_payee: display_for(txs), amount: amount,
40
+ since: dates.first, until: dates.last, count: dates.size, dates: dates)
41
+ end
42
+
43
+ def display_for(txs)
44
+ @payee_normalizer.display_name(txs.first.payee) || best_display_payee(txs)
45
+ end
46
+
47
+ def normalize_payee(payee)
48
+ payee.to_s.downcase
49
+ .gsub(/[^a-z0-9\s]/, ' ')
50
+ .gsub(/\s+/, ' ')
51
+ .strip
52
+ end
53
+
54
+ def best_display_payee(txs)
55
+ counts = Hash.new(0)
56
+ txs.each { |t| counts[t.payee.to_s.strip] += 1 }
57
+ counts.max_by { |_k, v| v }&.first || txs.first.payee.to_s
58
+ end
59
+
60
+ def recurring_monthlyish?(dates)
61
+ return false if dates.size < @min_occurrences
62
+
63
+ gaps = dates.each_cons(2).map { |a, b| (b - a).to_i }
64
+ monthlyish = gaps.count { |d| d.between?(@min_gap, @max_gap) }
65
+ monthlyish >= [1, (gaps.size * 0.6).ceil].max
66
+ end
67
+ end
68
+ end
69
+ end
@@ -0,0 +1,231 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'optparse'
4
+ require 'csv'
5
+ require 'set'
6
+ require 'yaml'
7
+ require 'bigdecimal'
8
+ require 'date'
9
+
10
+ require_relative 'transaction'
11
+ require_relative 'schema_registry'
12
+ require_relative '../schemas/generic'
13
+ require_relative '../schemas/american_express'
14
+ require_relative '../schemas/navy_federal'
15
+ require_relative '../detectors/known_payees'
16
+ require_relative '../detectors/repeat_charges'
17
+ require_relative '../output/stdout_reporter'
18
+ require_relative '../output/json_reporter'
19
+ require_relative '../output/csv_reporter'
20
+ require_relative '../find_subscriptions/payee_normalizer'
21
+
22
+ module FindSubscriptions
23
+ # Command-line interface: parses options, loads CSVs, runs detectors, and reports subscriptions.
24
+ class CLI
25
+ SORT_PROCS = {
26
+ 'count_asc' => ->(subs) { subs.sort_by { |sub| sub[:count] } },
27
+ 'count_desc' => ->(subs) { subs.sort_by { |sub| -sub[:count] } },
28
+ 'first_asc' => ->(subs) { subs.sort_by { |sub| sub[:since] } },
29
+ 'first_desc' => ->(subs) { subs.sort_by { |sub| sub[:since] }.reverse },
30
+ 'last_asc' => ->(subs) { subs.sort_by { |sub| sub[:until] } },
31
+ 'last_desc' => ->(subs) { subs.sort_by { |sub| sub[:until] }.reverse }
32
+ }.freeze
33
+
34
+ VALID_SORT_ORDERS = SORT_PROCS.keys.freeze
35
+
36
+ def self.run(argv)
37
+ new.run(argv)
38
+ end
39
+
40
+ def run(argv)
41
+ options = parse_options(argv)
42
+
43
+ files = options.fetch(:files)
44
+ raise ArgumentError, 'No files provided' if files.empty?
45
+
46
+ registry = build_registry
47
+ transactions = load_transactions(files, registry, options[:schema])
48
+ transactions = filter_by_date_range(transactions, options[:from_date], options[:to_date])
49
+ payee_normalizer = PayeeNormalizer.from_yaml(options[:known_payees_path])
50
+
51
+ subscriptions = detect_subscriptions(transactions, payee_normalizer, options)
52
+
53
+ reporter_for(options[:format]).print(subscriptions)
54
+ end
55
+
56
+ private
57
+
58
+ def detect_subscriptions(transactions, payee_normalizer, options)
59
+ detector = Detectors::RepeatCharges.new(payee_normalizer: payee_normalizer, min_occurrences: 2)
60
+ candidates = detector.detect(transactions)
61
+ candidates = filter_known(candidates, payee_normalizer) if options[:filter_known_payees]
62
+ candidates = filter_by_min_amount(candidates, options[:min_amount]) if options[:min_amount]
63
+ subscriptions = candidates.map { |candidate| candidate_to_hash(candidate) }
64
+ subscriptions = filter_inactive(subscriptions, options[:inactive_for]) if options[:inactive_for]
65
+ sort_subscriptions(subscriptions, options[:sort])
66
+ end
67
+
68
+ INACTIVE_FOR_PATTERN = /\A(\d+)\s*(year|month|week)s?\z/i.freeze
69
+
70
+ def filter_inactive(subscriptions, inactive_for, today: Date.today)
71
+ count, unit = parse_inactive_for(inactive_for)
72
+ cutoff = inactive_cutoff(count, unit.downcase, today)
73
+ subscriptions.select { |sub| sub[:until] >= cutoff }
74
+ end
75
+
76
+ def parse_inactive_for(value)
77
+ match = value.to_s.match(INACTIVE_FOR_PATTERN)
78
+ unless match
79
+ raise ArgumentError,
80
+ "Invalid --inactive-for value: #{value.inspect}. Expected format: NUMBER(year|month|week)[s]"
81
+ end
82
+
83
+ [match[1].to_i, match[2]]
84
+ end
85
+
86
+ def inactive_cutoff(count, unit, today)
87
+ case unit
88
+ when 'year' then today << (count * 12)
89
+ when 'month' then today << count
90
+ when 'week' then today - (count * 7)
91
+ end
92
+ end
93
+
94
+ def reporter_for(format)
95
+ case format
96
+ when 'json' then Output::JsonReporter.new
97
+ when 'csv' then Output::CsvReporter.new
98
+ else Output::StdoutReporter.new
99
+ end
100
+ end
101
+
102
+ def filter_by_date_range(transactions, from_date, to_date)
103
+ transactions.select do |txn|
104
+ (from_date.nil? || txn.date >= from_date) &&
105
+ (to_date.nil? || txn.date <= to_date)
106
+ end
107
+ end
108
+
109
+ def filter_by_min_amount(candidates, min_amount)
110
+ threshold = BigDecimal(min_amount.to_s)
111
+ candidates.select { |candidate| candidate.amount >= threshold }
112
+ end
113
+
114
+ def filter_known(candidates, payee_normalizer)
115
+ candidates.reject { |candidate| payee_normalizer.known_payee_key?(candidate.payee_key) }
116
+ end
117
+
118
+ def candidate_to_hash(candidate)
119
+ {
120
+ name: candidate.display_payee,
121
+ amount: format_money(candidate.amount),
122
+ since: candidate.since,
123
+ until: candidate.until,
124
+ count: candidate.count
125
+ }
126
+ end
127
+
128
+ def sort_subscriptions(subscriptions, sort_order)
129
+ sorter = SORT_PROCS[sort_order]
130
+ unless sorter
131
+ raise ArgumentError, "Invalid sort order: #{sort_order}. Valid options: #{VALID_SORT_ORDERS.join(', ')}"
132
+ end
133
+
134
+ sorter.call(subscriptions)
135
+ end
136
+
137
+ def format_money(decimal)
138
+ format('%.2f', decimal.to_f)
139
+ end
140
+
141
+ def parse_options(argv)
142
+ options = default_options
143
+ define_option_parser(options).parse!(argv)
144
+ options
145
+ end
146
+
147
+ DEFAULT_KNOWN_PAYEES_PATH = File.expand_path('../../data/known_payees.yml', __dir__).freeze
148
+
149
+ def default_options
150
+ {
151
+ files: [], schema: nil,
152
+ sort: 'first_desc', format: 'text',
153
+ min_amount: nil, from_date: nil, to_date: nil,
154
+ filter_known_payees: false,
155
+ known_payees_path: DEFAULT_KNOWN_PAYEES_PATH
156
+ }
157
+ end
158
+
159
+ def define_option_parser(options) # rubocop:disable Metrics/MethodLength
160
+ OptionParser.new do |opt|
161
+ opt.banner = 'Usage: find-subscriptions --files a.csv,b.csv [--schema NAME]'
162
+ opt.on('--files FILES', 'Comma-separated list of CSV files') do |val|
163
+ options[:files] = val.split(',').map(&:strip).reject(&:empty?)
164
+ end
165
+ opt.on('--schema NAME', 'Force schema name (otherwise auto-detect)') do |val|
166
+ options[:schema] = val.strip
167
+ end
168
+ opt.on('--known-payees PATH', 'Known payees YAML; matched payees are filtered from output') do |val|
169
+ options[:known_payees_path] = val
170
+ options[:filter_known_payees] = true
171
+ end
172
+ opt.on('--inactive-for DURATION',
173
+ 'Hide subscriptions with no transactions in DURATION (e.g. 6months, 1year, 3weeks)') do |val|
174
+ options[:inactive_for] = val.strip
175
+ end
176
+ opt.on('--min-amount AMOUNT', 'Hide subscriptions with a recurring charge below AMOUNT') do |val|
177
+ options[:min_amount] = val.strip
178
+ end
179
+ register_date_range_options(opt, options)
180
+ register_presentation_options(opt, options)
181
+ end
182
+ end
183
+
184
+ def register_date_range_options(opt, options)
185
+ opt.on('--from DATE', 'Only include transactions on or after DATE (YYYY-MM-DD)') do |val|
186
+ options[:from_date] = Date.parse(val)
187
+ end
188
+ opt.on('--to DATE', 'Only include transactions on or before DATE (YYYY-MM-DD)') do |val|
189
+ options[:to_date] = Date.parse(val)
190
+ end
191
+ end
192
+
193
+ def register_presentation_options(opt, options)
194
+ opt.on('--sort ORDER', "Sort order: #{VALID_SORT_ORDERS.join(', ')} (default: first_desc)") do |val|
195
+ options[:sort] = val.strip
196
+ end
197
+ opt.on('--format FORMAT', 'Output format: text (default), json, csv') do |val|
198
+ options[:format] = val.strip
199
+ end
200
+ end
201
+
202
+ def build_registry
203
+ registry = SchemaRegistry.new
204
+ registry.register('american_express', Schemas.american_express)
205
+ registry.register('navy_federal', Schemas.navy_federal)
206
+ registry.register('generic', Schemas.generic)
207
+ registry
208
+ end
209
+
210
+ def load_transactions(files, registry, forced_schema_name)
211
+ files.flat_map do |path|
212
+ raise ArgumentError, "File not found: #{path}" unless File.exist?(path)
213
+
214
+ csv = CSV.read(path, headers: true)
215
+ schema = fetch_schema(path, csv, registry, forced_schema_name)
216
+ csv.map { |row| schema.map_row(row.to_h) }
217
+ end
218
+ end
219
+
220
+ def fetch_schema(path, csv, registry, forced_schema_name)
221
+ if forced_schema_name
222
+ registry.fetch(forced_schema_name)
223
+ else
224
+ detected = registry.detect_for(csv.headers)
225
+ raise ArgumentError, "Could not detect schema for #{File.basename(path)}. Use --schema." unless detected
226
+
227
+ detected
228
+ end
229
+ end
230
+ end
231
+ end
@@ -0,0 +1,81 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'yaml'
4
+
5
+ module FindSubscriptions
6
+ # Normalizes and displays payee names using YAML-defined rules (regex patterns and canonical names).
7
+ class PayeeNormalizer
8
+ attr_reader :rules
9
+
10
+ Rule = Struct.new(:name, :normalized, :regexes, keyword_init: true)
11
+
12
+ def initialize(rules: [])
13
+ @rules = rules
14
+ end
15
+
16
+ def normalize(raw_payee)
17
+ text = raw_payee.to_s
18
+ rule = @rules.find { |r| r.regexes.any? { |re| re.match?(text) } }
19
+ return rule.normalized if rule
20
+
21
+ # fallback: generic normalization (same idea you used before)
22
+ fallback_normalize(text)
23
+ end
24
+
25
+ def display_name(raw_payee)
26
+ text = raw_payee.to_s
27
+ rule = @rules.find { |r| r.regexes.any? { |re| re.match?(text) } }
28
+ rule&.name
29
+ end
30
+
31
+ def known_payee_key?(normalized_key)
32
+ @rules.any? { |r| r.normalized == normalized_key }
33
+ end
34
+
35
+ def self.from_yaml(path)
36
+ return new(rules: []) unless path && File.exist?(path)
37
+
38
+ data = YAML.load_file(path)
39
+ raise ArgumentError, 'known payees YAML must be an array of rules' unless data.is_a?(Array)
40
+
41
+ new(rules: data.map { |h| build_rule(h) })
42
+ end
43
+
44
+ def self.build_rule(rule_hash)
45
+ unless rule_hash.is_a?(Hash) && rule_hash['normalized'] && rule_hash['patterns']
46
+ raise ArgumentError, "Each rule needs 'normalized' and 'patterns'"
47
+ end
48
+
49
+ Rule.new(
50
+ name: rule_hash['name'],
51
+ normalized: rule_hash['normalized'].to_s,
52
+ regexes: Array(rule_hash['patterns']).map { |p| parse_regex(p) }
53
+ )
54
+ end
55
+
56
+ def self.parse_regex(str)
57
+ s = str.to_s.strip
58
+ unless s.start_with?('/') && s.count('/') >= 2
59
+ raise ArgumentError, "Invalid regex string: #{str.inspect} (expected like \"/foo/i\")"
60
+ end
61
+
62
+ last_slash = s.rindex('/')
63
+ Regexp.new(s[1...last_slash], regex_flags(s[(last_slash + 1)..]))
64
+ end
65
+
66
+ def self.regex_flags(flags)
67
+ opts = 0
68
+ opts |= Regexp::IGNORECASE if flags&.include?('i')
69
+ opts |= Regexp::MULTILINE if flags&.include?('m')
70
+ opts |= Regexp::EXTENDED if flags&.include?('x')
71
+ opts
72
+ end
73
+
74
+ def fallback_normalize(payee)
75
+ payee.downcase
76
+ .gsub(/[^a-z0-9\s+]/, ' ') # keep + since you use P+
77
+ .gsub(/\s+/, ' ')
78
+ .strip
79
+ end
80
+ end
81
+ end
@@ -0,0 +1,72 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'csv'
4
+ require 'bigdecimal'
5
+ require 'date'
6
+
7
+ module FindSubscriptions
8
+ # Registry of CSV schemas; supports registration, lookup by name, and auto-detection from headers.
9
+ class SchemaRegistry
10
+ attr_reader :schemas
11
+
12
+ def initialize
13
+ @schemas = {}
14
+ end
15
+
16
+ def register(name, schema)
17
+ @schemas[name] = schema
18
+ end
19
+
20
+ def fetch(name)
21
+ @schemas.fetch(name) { raise ArgumentError, "Unknown schema: #{name}" }
22
+ end
23
+
24
+ # Tries each schema until one matches the CSV headers.
25
+ def detect_for(csv_headers)
26
+ @schemas.each_value do |schema|
27
+ return schema if schema.matches_headers?(csv_headers)
28
+ end
29
+ nil
30
+ end
31
+ end
32
+
33
+ # Defines how to parse a CSV: required headers, amount column, debit/credit direction, and row-to-Transaction mapping.
34
+ class CsvSchema
35
+ attr_reader :required_headers, :amount_key, :direction, :mapping
36
+
37
+ # direction: lambda(row_hash, amount_bd) => :debit or :credit
38
+ # mapping: lambda(row_hash, signed_amount_bd) => Transaction
39
+ def initialize(required_headers:, amount_key:, direction:, mapping:)
40
+ @required_headers = required_headers.map(&:strip).to_set
41
+ @amount_key = amount_key
42
+ @direction = direction
43
+ @mapping = mapping
44
+ end
45
+
46
+ def matches_headers?(headers)
47
+ headers_set = headers.map(&:strip).to_set
48
+ @required_headers.subset?(headers_set)
49
+ end
50
+
51
+ def map_row(row_hash)
52
+ raw_amount = row_hash.fetch(@amount_key).to_f
53
+ dir = @direction.call(row_hash, raw_amount)
54
+
55
+ signed =
56
+ case dir
57
+ when :debit then raw_amount.abs # outgoing positive
58
+ when :credit then -raw_amount.abs # incoming/refund negative
59
+ else
60
+ raise ArgumentError, 'direction must be :debit or :credit'
61
+ end
62
+
63
+ @mapping.call(row_hash, signed)
64
+ end
65
+
66
+ def ==(other)
67
+ other.is_a?(CsvSchema) &&
68
+ @required_headers == other.required_headers &&
69
+ @amount_key == other.amount_key
70
+ end
71
+ end
72
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'date'
4
+
5
+ # Top-level namespace for subscription detection from bank/credit CSV exports.
6
+ module FindSubscriptions
7
+ # A single parsed transaction: date, payee, signed amount (positive = outgoing), and raw row.
8
+ Transaction = Struct.new(
9
+ :date, # Date
10
+ :payee, # String
11
+ :amount, # BigDecimal (positive = outgoing)
12
+ :raw, # Hash original row
13
+ keyword_init: true
14
+ )
15
+ end
@@ -0,0 +1,4 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Entry point for FindSubscriptions: subscription detection from bank/credit CSV exports.
4
+ require_relative 'find_subscriptions/transaction'
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'csv'
4
+
5
+ module FindSubscriptions
6
+ module Output
7
+ # Outputs subscriptions as CSV with a header row.
8
+ class CsvReporter
9
+ HEADERS = %w[name amount since until count].freeze
10
+
11
+ def initialize(io: $stdout)
12
+ @io = io
13
+ end
14
+
15
+ def print(subscriptions)
16
+ output = CSV.generate do |csv|
17
+ csv << HEADERS
18
+ subscriptions.each do |sub|
19
+ csv << [sub[:name], sub[:amount], sub[:since].iso8601, sub[:until].iso8601, sub[:count]]
20
+ end
21
+ end
22
+ @io.print output
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,27 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'json'
4
+
5
+ module FindSubscriptions
6
+ module Output
7
+ # Outputs subscriptions as a pretty-printed JSON array.
8
+ class JsonReporter
9
+ def initialize(io: $stdout)
10
+ @io = io
11
+ end
12
+
13
+ def print(subscriptions)
14
+ data = subscriptions.map do |sub|
15
+ {
16
+ name: sub[:name],
17
+ amount: sub[:amount],
18
+ since: sub[:since].iso8601,
19
+ until: sub[:until].iso8601,
20
+ count: sub[:count]
21
+ }
22
+ end
23
+ @io.puts JSON.pretty_generate(data)
24
+ end
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,37 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FindSubscriptions
4
+ # Output formatters (stdout, future: JSON, etc.).
5
+ module Output
6
+ # Prints subscription candidates to an IO (default stdout) in a human-readable format.
7
+ class StdoutReporter
8
+ def initialize(io: $stdout)
9
+ @io = io
10
+ end
11
+
12
+ # subscriptions: array of hashes like:
13
+ # { name: "Netflix", amount: "14.99", since: Date }
14
+ def print(subscriptions)
15
+ @io.puts 'Subscriptions:'
16
+ if subscriptions.empty?
17
+ @io.puts ' - (none found)'
18
+ return
19
+ end
20
+
21
+ subscriptions.each { |s| @io.puts format_subscription(s) }
22
+ end
23
+
24
+ private
25
+
26
+ def format_subscription(sub)
27
+ tx_label = sub[:count] == 1 ? 'transaction' : 'transactions'
28
+ " - #{format_name(sub[:name])}: $#{sub[:amount]} since #{sub[:since].strftime('%B %Y')}" \
29
+ " (#{sub[:count]} #{tx_label}) until #{sub[:until].strftime('%B %Y')}"
30
+ end
31
+
32
+ def format_name(name)
33
+ name.ljust(70)
34
+ end
35
+ end
36
+ end
37
+ end
@@ -0,0 +1,31 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bigdecimal'
4
+ require 'date'
5
+
6
+ module FindSubscriptions
7
+ # CSV schema definitions for supported banks and card issuers.
8
+ module Schemas
9
+ AMEX_DIRECTION = lambda { |_row, amount|
10
+ amount.negative? ? :credit : :debit
11
+ }.freeze
12
+
13
+ AMEX_MAPPING = lambda { |row, signed_amount|
14
+ Transaction.new(
15
+ date: Date.strptime(row['Date'], '%m/%d/%Y'),
16
+ payee: row.fetch('Description').to_s.strip,
17
+ amount: signed_amount,
18
+ raw: row
19
+ )
20
+ }.freeze
21
+
22
+ def self.american_express
23
+ CsvSchema.new(
24
+ required_headers: ['Date', 'Description', 'Amount', 'Card Member'],
25
+ amount_key: 'Amount',
26
+ direction: AMEX_DIRECTION,
27
+ mapping: AMEX_MAPPING
28
+ )
29
+ end
30
+ end
31
+ end
@@ -0,0 +1,35 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'set'
4
+ require 'bigdecimal'
5
+ require 'date'
6
+ require_relative '../../lib/find_subscriptions/transaction'
7
+
8
+ module FindSubscriptions
9
+ # CSV schema definitions for supported banks and card issuers.
10
+ module Schemas
11
+ DATE_FORMAT = '%Y-%m-%d'
12
+
13
+ def self.generic
14
+ CsvSchema.new(
15
+ required_headers: %w[Date Description Amount],
16
+ amount_key: 'Amount',
17
+ direction: lambda { |_row, amount|
18
+ amount.negative? ? :debit : :credit
19
+ },
20
+ mapping: lambda do |row, signed_amount|
21
+ map_row(row, signed_amount)
22
+ end
23
+ )
24
+ end
25
+
26
+ def self.map_row(row, signed_amount)
27
+ Transaction.new(
28
+ date: Date.strptime(row.fetch('Date'), DATE_FORMAT),
29
+ payee: row.fetch('Description').to_s.strip,
30
+ amount: signed_amount,
31
+ raw: row
32
+ )
33
+ end
34
+ end
35
+ end
@@ -0,0 +1,36 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bigdecimal'
4
+ require 'date'
5
+
6
+ module FindSubscriptions
7
+ # CSV schema definitions for supported banks and card issuers.
8
+ module Schemas
9
+ NAVY_FEDERAL_DIRECTION = lambda { |row, _amount|
10
+ type = row.fetch('Credit Debit Indicator').to_s.strip.downcase
11
+ case type
12
+ when 'debit' then :debit
13
+ when 'credit' then :credit
14
+ else raise ArgumentError, "Unknown Credit Debit Indicator: #{row['Credit Debit Indicator']}"
15
+ end
16
+ }.freeze
17
+
18
+ NAVY_FEDERAL_MAPPING = lambda { |row, signed_amount|
19
+ Transaction.new(
20
+ date: Date.strptime(row.fetch('Transaction Date'), '%m/%d/%Y'),
21
+ payee: row.fetch('Description').to_s.strip,
22
+ amount: signed_amount,
23
+ raw: row
24
+ )
25
+ }.freeze
26
+
27
+ def self.navy_federal
28
+ CsvSchema.new(
29
+ required_headers: ['Transaction Date', 'Description', 'Amount', 'Credit Debit Indicator'],
30
+ amount_key: 'Amount',
31
+ direction: NAVY_FEDERAL_DIRECTION,
32
+ mapping: NAVY_FEDERAL_MAPPING
33
+ )
34
+ end
35
+ end
36
+ end
metadata ADDED
@@ -0,0 +1,118 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: find-subscriptions
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jeffrey Baird
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2026-03-05 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bigdecimal
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '3.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '3.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '13.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '13.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.12'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.12'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rubocop
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.50'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.50'
69
+ description: |
70
+ A CLI tool that analyzes CSV files from banks and credit cards to detect
71
+ subscription charges based on known payees and recurring transaction patterns.
72
+ email:
73
+ - jeffreybaird@hey.com
74
+ executables:
75
+ - find-subscriptions
76
+ extensions: []
77
+ extra_rdoc_files: []
78
+ files:
79
+ - LICENSE
80
+ - README.md
81
+ - bin/find-subscriptions
82
+ - lib/detectors/known_payees.rb
83
+ - lib/detectors/repeat_charges.rb
84
+ - lib/find_subscriptions.rb
85
+ - lib/find_subscriptions/cli.rb
86
+ - lib/find_subscriptions/payee_normalizer.rb
87
+ - lib/find_subscriptions/schema_registry.rb
88
+ - lib/find_subscriptions/transaction.rb
89
+ - lib/output/csv_reporter.rb
90
+ - lib/output/json_reporter.rb
91
+ - lib/output/stdout_reporter.rb
92
+ - lib/schemas/american_express.rb
93
+ - lib/schemas/generic.rb
94
+ - lib/schemas/navy_federal.rb
95
+ homepage: https://github.com/jeffreybaird/find-subscriptions
96
+ licenses:
97
+ - PolyForm-Noncommercial-1.0.0
98
+ metadata: {}
99
+ post_install_message:
100
+ rdoc_options: []
101
+ require_paths:
102
+ - lib
103
+ required_ruby_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - ">="
106
+ - !ruby/object:Gem::Version
107
+ version: '2.7'
108
+ required_rubygems_version: !ruby/object:Gem::Requirement
109
+ requirements:
110
+ - - ">="
111
+ - !ruby/object:Gem::Version
112
+ version: '0'
113
+ requirements: []
114
+ rubygems_version: 3.4.1
115
+ signing_key:
116
+ specification_version: 4
117
+ summary: Find recurring subscription charges in bank and credit card CSV exports
118
+ test_files: []