sec_id 5.0.0 → 5.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 48989505724d7895a9a0bd0308602f4e8a1f5870c91df5802e32f2a26af2c7c4
4
- data.tar.gz: 2e810ceadd4d02b47dc3a7cc6eab754e305a44ce83f883a0afdac7cef2bd4740
3
+ metadata.gz: bc458a32d8a5cb8fc4db2b1ab4c635a6c88e4160d38948e4ce779e90b039bd3c
4
+ data.tar.gz: 298298a51ca424aef20c814ed620e35d32bbfe9bf285fe1f6ee6f44bac163980
5
5
  SHA512:
6
- metadata.gz: 8f4897e3de16457e2206fcae937316119a1c824b224bbf84d41f61c1d66f6cd5b48e8662bccfca007e68da6e0e059a130dd04ea82778ae2208a9d2aacfc34a75
7
- data.tar.gz: b807ce12ec66636016262686b71666d9c84a9e2649c5816dcd97772a02b648a64a26a6f86691969d97f76583652347bb2c627407b674773630764cffe4e3e216
6
+ metadata.gz: 78650675337ce8a03970e4b1227713aaa0adafb8c18da35466099add8fc39389c8b02583602c18f9aeb59a340ee6b9db9f42ef7662d96284dd0ed93336a50a6a
7
+ data.tar.gz: 27d54f707d2ec471218a15e6f9dadc2d78d3f46985084f1e18be921b83691f4fe5cd4d82d793ae9cbb34c3e55df30e2966aa834b5942606dfa81d5e3d38e4799
data/CHANGELOG.md CHANGED
@@ -8,6 +8,30 @@ and [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/).
8
8
 
9
9
  ## [Unreleased]
10
10
 
11
+ ## [5.2.0] - 2026-02-24
12
+
13
+ ### Added
14
+
15
+ - `SecID.scan` and `SecID.extract` methods for finding identifiers in freeform text — returns `Scanner::Match` objects (`Data.define(:type, :raw, :range, :identifier)`) with the validated identifier instance; supports `types:` filtering, hyphenated identifiers, and compound patterns (OCC with spaces, FISN with slashes)
16
+ - `SecID.explain` method for debugging identifier detection — returns per-type validation results showing exactly why each type matched or rejected the input
17
+ - `on_ambiguous:` option for `SecID.parse` and `SecID.parse!` — `:first` (default, existing behavior), `:raise` (raises `AmbiguousMatchError`), `:all` (returns array of all matching instances)
18
+ - `SecID::AmbiguousMatchError` exception class for ambiguous identifier detection
19
+ - `#as_json` method on all identifier types (delegates to `#to_h`) and on `Errors` (delegates to `#details`) for JSON serialization compatibility
20
+ - `SecID::IBAN.supported_countries` class method returning sorted array of all supported country codes
21
+ - `SecID::CFI.categories` class method returning the categories hash
22
+ - `SecID::CFI.groups_for(category_code)` class method returning groups hash for a given category
23
+
24
+
25
+ ## [5.1.0] - 2026-02-19
26
+
27
+ ### Added
28
+
29
+ - `#==`, `#eql?`, and `#hash` methods on all identifier types — two instances of the same type with the same normalized form are equal and usable as Hash keys / in Sets
30
+ - `#to_h` method on all identifier types for consistent hash serialization — returns `{ type:, full_id:, normalized:, valid:, components: }` with type-specific component hashes (e.g. ISIN: `country_code`, `nsin`, `check_digit`)
31
+ - `#to_pretty_s` and `.to_pretty_s` display formatting methods on all identifier types, returning a human-readable string or `nil` for invalid input — with type-specific formats for IBAN (4-char groups), LEI (4-char groups), ISIN (CC + NSIN + CD), CUSIP (cusip6 + issue + CD), FIGI (prefix+G + random + CD), OCC (space-separated components), and Valoren (thousands grouping)
32
+ - Lookup service integration guides and runnable examples for OpenFIGI, SEC EDGAR, GLEIF, and Eurex APIs (`docs/guides/`, `examples/`)
33
+ - GitHub community standards files: Code of Conduct, Contributing guide, Security policy, issue templates, and PR template
34
+
11
35
  ## [5.0.0] - 2026-02-17
12
36
 
13
37
  ### Added
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # SecID [![Gem Version](https://img.shields.io/gem/v/sec_id)](https://rubygems.org/gems/sec_id) [![Codecov](https://img.shields.io/codecov/c/github/svyatov/sec_id)](https://app.codecov.io/gh/svyatov/sec_id) [![CI](https://github.com/svyatov/sec_id/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/svyatov/sec_id/actions?query=workflow%3ACI)
2
2
 
3
- > Validate securities identification numbers with ease!
3
+ > A Ruby toolkit for securities identifiers validate, parse, normalize, detect, and convert.
4
4
 
5
5
  ## Table of Contents
6
6
 
@@ -8,6 +8,8 @@
8
8
  - [Installation](#installation)
9
9
  - [Supported Standards and Usage](#supported-standards-and-usage)
10
10
  - [Metadata Registry](#metadata-registry) - enumerate, filter, look up, and detect identifier types
11
+ - [Text Scanning](#text-scanning) - find identifiers in freeform text
12
+ - [Debugging Detection](#debugging-detection) - understand why strings match or don't
11
13
  - [Structured Validation](#structured-validation) - detailed error codes and messages
12
14
  - [ISIN](#isin) - International Securities Identification Number
13
15
  - [CUSIP](#cusip) - Committee on Uniform Securities Identification Procedures
@@ -22,6 +24,7 @@
22
24
  - [Valoren](#valoren) - Swiss Security Number
23
25
  - [CFI](#cfi) - Classification of Financial Instruments
24
26
  - [FISN](#fisn) - Financial Instrument Short Name
27
+ - [Lookup Service Integration](#lookup-service-integration)
25
28
  - [Development](#development)
26
29
  - [Contributing](#contributing)
27
30
  - [Changelog](#changelog)
@@ -37,7 +40,7 @@ Ruby 3.2+ is required.
37
40
  Add this line to your application's Gemfile:
38
41
 
39
42
  ```ruby
40
- gem 'sec_id', '~> 5.0'
43
+ gem 'sec_id', '~> 5.2'
41
44
  ```
42
45
 
43
46
  And then execute:
@@ -58,10 +61,39 @@ gem install sec_id
58
61
 
59
62
  All identifier classes provide `valid?`, `errors`, `validate`, `validate!` methods at both class and instance levels.
60
63
 
61
- **All identifiers** support normalization:
64
+ **All identifiers** support normalization and display formatting:
62
65
  - `.normalize(id)` - strips separators, upcases, validates, and returns the canonical string
63
66
  - `#normalized` / `#normalize` - returns the canonical string for a valid instance
64
67
  - `#normalize!` - mutates `full_id` to canonical form, returns `self`
68
+ - `#to_pretty_s` / `.to_pretty_s(id)` - returns a human-readable formatted string, or `nil` for invalid input
69
+
70
+ **All identifiers** support hash serialization:
71
+ - `#to_h` - returns a hash with `:type`, `:full_id`, `:normalized`, `:valid`, and `:components` keys
72
+ - `#as_json` - same as `#to_h`, for JSON serialization compatibility (Rails, `JSON.generate`, etc.)
73
+
74
+ ```ruby
75
+ SecID::ISIN.new('US5949181045').to_h
76
+ # => { type: :isin, full_id: 'US5949181045', normalized: 'US5949181045',
77
+ # valid: true, components: { country_code: 'US', nsin: '594918104', check_digit: 5 } }
78
+
79
+ SecID::ISIN.new('INVALID').to_h
80
+ # => { type: :isin, full_id: 'INVALID', normalized: nil,
81
+ # valid: false, components: { country_code: nil, nsin: nil, check_digit: nil } }
82
+ ```
83
+
84
+ **All identifiers** support value equality — two instances of the same type with the same normalized form are equal:
85
+
86
+ ```ruby
87
+ a = SecID::ISIN.new('US5949181045')
88
+ b = SecID::ISIN.new('us 5949 1810 45')
89
+
90
+ a == b # => true
91
+ a.eql?(b) # => true
92
+
93
+ # Works as Hash keys and in Sets
94
+ { a => 'MSFT' }[b] # => 'MSFT'
95
+ Set.new([a, b]).size # => 1
96
+ ```
65
97
 
66
98
  **Check-digit based identifiers** (ISIN, CUSIP, CEI, SEDOL, FIGI, LEI, IBAN) also provide:
67
99
  - `restore` / `.restore` - returns the full identifier string with correct check-digit (no mutation)
@@ -115,6 +147,56 @@ SecID.parse('594918104', types: [:cusip]) # => #<SecID::CUSIP>
115
147
  # Bang version raises on failure
116
148
  SecID.parse!('US5949181045') # => #<SecID::ISIN>
117
149
  SecID.parse!('unknown') # raises SecID::InvalidFormatError
150
+
151
+ # Handle ambiguous matches
152
+ SecID.parse('514000', on_ambiguous: :first) # => #<SecID::WKN> (default)
153
+ SecID.parse('514000', on_ambiguous: :raise) # raises SecID::AmbiguousMatchError
154
+ SecID.parse('514000', on_ambiguous: :all) # => [#<SecID::WKN>, #<SecID::Valoren>, #<SecID::CIK>]
155
+ SecID.parse('US5949181045', on_ambiguous: :raise) # => #<SecID::ISIN> (unambiguous, no error)
156
+ ```
157
+
158
+ ### Text Scanning
159
+
160
+ Find identifiers embedded in freeform text:
161
+
162
+ ```ruby
163
+ # Extract all identifiers from text
164
+ matches = SecID.extract('Portfolio: US5949181045, 594918104, B0YBKJ7')
165
+ matches.map(&:type) # => [:isin, :cusip, :sedol]
166
+ matches.first.raw # => "US5949181045"
167
+ matches.first.range # => 11...23
168
+ matches.first.identifier.country_code # => "US"
169
+
170
+ # Lazy scanning with Enumerator
171
+ SecID.scan('Buy US5949181045 now').each { |m| puts m.type }
172
+
173
+ # Filter by types
174
+ SecID.extract('514000', types: [:valoren]) # => only Valoren matches
175
+
176
+ # Handles hyphenated identifiers
177
+ match = SecID.extract('ID: US-5949-1810-45').first
178
+ match.raw # => "US-5949-1810-45"
179
+ match.identifier.normalized # => "US5949181045"
180
+ ```
181
+
182
+ > **Known limitations:** Format-only types (CIK, Valoren, WKN, CFI) can false-positive on
183
+ > common numbers and short words in prose — use the `types:` filter to restrict scanning when
184
+ > this is a concern. Identifiers prefixed with special characters (e.g. `#US5949181045`) may be
185
+ > consumed as a single token by CUSIP's `*@#` character class and fail validation, preventing
186
+ > the embedded identifier from being found.
187
+
188
+ ### Debugging Detection
189
+
190
+ Understand why a string matches or doesn't match specific identifier types:
191
+
192
+ ```ruby
193
+ result = SecID.explain('US5949181040')
194
+ isin = result[:candidates].find { |c| c[:type] == :isin }
195
+ isin[:valid] # => false
196
+ isin[:errors].first[:error] # => :invalid_check_digit
197
+
198
+ # Filter to specific types
199
+ SecID.explain('US5949181045', types: %i[isin cusip])
118
200
  ```
119
201
 
120
202
  ### Structured Validation
@@ -192,6 +274,7 @@ isin.valid? # => true
192
274
  isin.restore # => 'US5949181045'
193
275
  isin.restore! # => #<SecID::ISIN> (mutates instance)
194
276
  isin.calculate_check_digit # => 5
277
+ isin.to_pretty_s # => 'US 594918104 5'
195
278
  isin.to_cusip # => #<SecID::CUSIP>
196
279
  isin.nsin_type # => :cusip
197
280
  isin.to_nsin # => #<SecID::CUSIP>
@@ -236,6 +319,7 @@ cusip.valid? # => true
236
319
  cusip.restore # => '594918104'
237
320
  cusip.restore! # => #<SecID::CUSIP> (mutates instance)
238
321
  cusip.calculate_check_digit # => 4
322
+ cusip.to_pretty_s # => '594918 10 4'
239
323
  cusip.to_isin('US') # => #<SecID::ISIN>
240
324
  cusip.cins? # => false
241
325
  ```
@@ -308,6 +392,7 @@ figi.valid? # => true
308
392
  figi.restore # => 'BBG000DMBXR2'
309
393
  figi.restore! # => #<SecID::FIGI> (mutates instance)
310
394
  figi.calculate_check_digit # => 2
395
+ figi.to_pretty_s # => 'BBG 000DMBXR 2'
311
396
  ```
312
397
 
313
398
  ### LEI
@@ -332,6 +417,7 @@ lei.valid? # => true
332
417
  lei.restore # => '5493006MHB84DD0ZWV18'
333
418
  lei.restore! # => #<SecID::LEI> (mutates instance)
334
419
  lei.calculate_check_digit # => 18
420
+ lei.to_pretty_s # => '5493 006M HB84 DD0Z WV18'
335
421
  ```
336
422
 
337
423
  ### IBAN
@@ -358,10 +444,16 @@ iban.restore # => 'DE89370400440532013000'
358
444
  iban.restore! # => #<SecID::IBAN> (mutates instance)
359
445
  iban.calculate_check_digit # => 89
360
446
  iban.known_country? # => true
447
+ iban.to_pretty_s # => 'DE89 3704 0044 0532 0130 00'
361
448
  ```
362
449
 
363
450
  Full BBAN structural validation is supported for EU/EEA countries. Other countries have length-only validation.
364
451
 
452
+ ```ruby
453
+ # List all supported countries
454
+ SecID::IBAN.supported_countries # => ["AD", "AE", "AT", "BE", "BG", "CH", ...]
455
+ ```
456
+
365
457
  ### CIK
366
458
 
367
459
  > [Central Index Key](https://en.wikipedia.org/wiki/Central_Index_Key) - a 10-digit number used by the SEC to identify corporations and individuals who have filed disclosures.
@@ -412,6 +504,7 @@ occ.full_id # => 'X 250620C00050000'
412
504
  occ.valid? # => true
413
505
  occ.normalize! # => #<SecID::OCC> (mutates full_id, returns self)
414
506
  occ.full_id # => 'X 250620C00050000'
507
+ occ.to_pretty_s # => 'X 250620 C 00050000'
415
508
  ```
416
509
 
417
510
  ### WKN
@@ -454,6 +547,7 @@ valoren.identifier # => '3886335'
454
547
  valoren.valid? # => true
455
548
  valoren.normalized # => '003886335'
456
549
  valoren.normalize! # => #<SecID::Valoren> (mutates full_id, returns self)
550
+ valoren.to_pretty_s # => '3 886 335'
457
551
  valoren.to_isin # => #<SecID::ISIN> (CH ISIN by default)
458
552
  valoren.to_isin('LI') # => #<SecID::ISIN> (LI ISIN)
459
553
  ```
@@ -486,6 +580,12 @@ cfi.registered? # => true
486
580
 
487
581
  CFI validates the category code (position 1) against 14 valid values and the group code (position 2) against valid values for that category. Attribute positions 3-6 accept any letter A-Z, with X meaning "not applicable".
488
582
 
583
+ ```ruby
584
+ # Introspect valid codes
585
+ SecID::CFI.categories # => { "E" => :equity, "C" => :collective_investment_vehicles, ... }
586
+ SecID::CFI.groups_for('E') # => { "S" => :common_shares, "P" => :preferred_shares, ... }
587
+ ```
588
+
489
589
  ### FISN
490
590
 
491
591
  > [Financial Instrument Short Name](https://en.wikipedia.org/wiki/ISO_18774) - a human-readable short name for financial instruments per ISO 18774.
@@ -506,6 +606,19 @@ fisn.to_s # => 'APPLE INC/SH'
506
606
 
507
607
  FISN format: `Issuer Name/Abbreviated Instrument Description` with issuer (1-15 chars) and description (1-19 chars) separated by a forward slash. Character set: uppercase A-Z, digits 0-9, and space.
508
608
 
609
+ ## Lookup Service Integration
610
+
611
+ SecID validates identifiers but does not include HTTP clients. The [`docs/guides/`](docs/guides/) directory provides integration patterns for external lookup services using only stdlib (`net/http`, `json`):
612
+
613
+ | Guide | Service | Identifier |
614
+ |-------|---------|------------|
615
+ | [OpenFIGI](docs/guides/openfigi.md) | [OpenFIGI API](https://www.openfigi.com/api) | FIGI |
616
+ | [SEC EDGAR](docs/guides/sec-edgar.md) | [SEC EDGAR](https://www.sec.gov/edgar/sec-api-documentation) | CIK |
617
+ | [GLEIF](docs/guides/gleif.md) | [GLEIF API](https://www.gleif.org/en/lei-data/gleif-api) | LEI |
618
+ | [Eurex](docs/guides/eurex.md) | [Eurex Reference Data](https://www.eurex.com/ex-en/data/free-reference-data-api) | ISIN |
619
+
620
+ Each guide includes a complete adapter class and a [runnable example](examples/).
621
+
509
622
  ## Development
510
623
 
511
624
  After checking out the repo, run `bin/setup` to install dependencies.
@@ -516,12 +629,9 @@ To install this gem onto your local machine, run `bundle exec rake install`.
516
629
 
517
630
  ## Contributing
518
631
 
519
- 1. Fork it
520
- 2. Create your feature branch (`git checkout -b my-new-feature`)
521
- 3. Make your changes and run tests (`bundle exec rake`)
522
- 4. Commit using [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) format (`git commit -m 'feat: add some feature'`)
523
- 5. Push to the branch (`git push origin my-new-feature`)
524
- 6. Create a new Pull Request
632
+ Bug reports and pull requests are welcome on [GitHub](https://github.com/svyatov/sec_id). See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, code style, and PR guidelines.
633
+
634
+ This project follows the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md).
525
635
 
526
636
  ## Changelog
527
637
 
data/lib/sec_id/base.rb CHANGED
@@ -52,6 +52,7 @@ module SecID
52
52
  # @api private
53
53
  def self.inherited(subclass)
54
54
  super
55
+ # Skip anonymous classes and classes outside the SecID namespace (e.g. in tests)
55
56
  SecID.__send__(:register_identifier, subclass) if subclass.name&.start_with?('SecID::')
56
57
  end
57
58
 
@@ -63,8 +64,53 @@ module SecID
63
64
  raise NotImplementedError
64
65
  end
65
66
 
67
+ # @param other [Object]
68
+ # @return [Boolean]
69
+ def ==(other)
70
+ other.class == self.class && comparison_id == other.comparison_id
71
+ end
72
+
73
+ alias eql? ==
74
+
75
+ # @return [Integer]
76
+ def hash
77
+ [self.class, comparison_id].hash
78
+ end
79
+
80
+ # Returns a hash representation of this identifier for serialization.
81
+ #
82
+ # @return [Hash] hash with :type, :full_id, :normalized, :valid, and :components keys
83
+ def to_h
84
+ {
85
+ type: self.class.short_name.downcase.to_sym,
86
+ full_id: full_id,
87
+ normalized: valid? ? normalized : nil,
88
+ valid: valid?,
89
+ components: components
90
+ }
91
+ end
92
+
93
+ # Returns a JSON-compatible hash representation.
94
+ #
95
+ # @return [Hash]
96
+ def as_json(*)
97
+ to_h
98
+ end
99
+
100
+ protected
101
+
102
+ # @return [String]
103
+ def comparison_id
104
+ valid? ? normalized : full_id
105
+ end
106
+
66
107
  private
67
108
 
109
+ # @return [Hash]
110
+ def components
111
+ {}
112
+ end
113
+
68
114
  # @param sec_id_number [String, #to_s] the identifier to parse
69
115
  # @return [MatchData, Hash] the regex match data or empty hash if no match
70
116
  def parse(sec_id_number)
data/lib/sec_id/cei.rb CHANGED
@@ -46,6 +46,13 @@ module SecID
46
46
  @check_digit = cei_parts[:check_digit]&.to_i
47
47
  end
48
48
 
49
+ private
50
+
51
+ # @return [Hash]
52
+ def components = { prefix:, numeric:, entity_id:, check_digit: }
53
+
54
+ public
55
+
49
56
  # @return [Integer] the calculated check digit (0-9)
50
57
  # @raise [InvalidFormatError] if the CEI format is invalid
51
58
  def calculate_check_digit
data/lib/sec_id/cfi.rb CHANGED
@@ -164,7 +164,22 @@ module SecID
164
164
  'C' => :combined_instruments,
165
165
  'M' => :miscellaneous
166
166
  }
167
- }.freeze
167
+ }.each_value(&:freeze).freeze
168
+
169
+ # Returns the category codes hash.
170
+ #
171
+ # @return [Hash{String => Symbol}]
172
+ def self.categories
173
+ CATEGORIES
174
+ end
175
+
176
+ # Returns the groups hash for a given category code.
177
+ #
178
+ # @param category_code [String] single-letter category code
179
+ # @return [Hash{String => Symbol}, nil]
180
+ def self.groups_for(category_code)
181
+ GROUPS[category_code.to_s.upcase]
182
+ end
168
183
 
169
184
  # @return [String, nil] the category code (position 1)
170
185
  attr_reader :category_code
@@ -299,6 +314,9 @@ module SecID
299
314
 
300
315
  private
301
316
 
317
+ # @return [Hash]
318
+ def components = { category_code:, group_code:, attr1:, attr2:, attr3:, attr4: }
319
+
302
320
  # @return [Boolean]
303
321
  def valid_format?
304
322
  super && valid_category? && valid_group?
@@ -23,6 +23,15 @@ module SecID
23
23
  cleaned = id.to_s.strip.gsub(self::SEPARATORS, '')
24
24
  new(cleaned.upcase).normalized
25
25
  end
26
+
27
+ # Returns a human-readable formatted string, or nil if invalid.
28
+ #
29
+ # @param id [String, #to_s] the identifier to format
30
+ # @return [String, nil]
31
+ def to_pretty_s(id)
32
+ cleaned = id.to_s.strip.gsub(self::SEPARATORS, '')
33
+ new(cleaned.upcase).to_pretty_s
34
+ end
26
35
  end
27
36
 
28
37
  # Returns the canonical normalized form of this identifier.
@@ -48,6 +57,15 @@ module SecID
48
57
  self
49
58
  end
50
59
 
60
+ # Returns a human-readable formatted string, or nil if invalid.
61
+ #
62
+ # @return [String, nil]
63
+ def to_pretty_s
64
+ return nil unless valid?
65
+
66
+ to_s
67
+ end
68
+
51
69
  # @return [String]
52
70
  def to_s
53
71
  identifier.to_s
data/lib/sec_id/cusip.rb CHANGED
@@ -45,6 +45,13 @@ module SecID
45
45
  @check_digit = cusip_parts[:check_digit]&.to_i
46
46
  end
47
47
 
48
+ # @return [String, nil]
49
+ def to_pretty_s
50
+ return nil unless valid?
51
+
52
+ "#{cusip6} #{issue} #{check_digit}"
53
+ end
54
+
48
55
  # @return [Integer] the calculated check digit (0-9)
49
56
  # @raise [InvalidFormatError] if the CUSIP format is invalid
50
57
  def calculate_check_digit
@@ -63,6 +70,13 @@ module SecID
63
70
  ISIN.new(country_code + restore).restore!
64
71
  end
65
72
 
73
+ private
74
+
75
+ # @return [Hash]
76
+ def components = { cusip6:, issue:, check_digit: }
77
+
78
+ public
79
+
66
80
  # @return [Boolean] true if first character is a letter (CINS identifier)
67
81
  def cins?
68
82
  cusip6[0] < '0' || cusip6[0] > '9'
data/lib/sec_id/errors.rb CHANGED
@@ -63,5 +63,12 @@ module SecID
63
63
  def to_a
64
64
  messages
65
65
  end
66
+
67
+ # Returns a JSON-compatible array of error detail hashes.
68
+ #
69
+ # @return [Array<Hash>]
70
+ def as_json(*)
71
+ details
72
+ end
66
73
  end
67
74
  end
data/lib/sec_id/figi.rb CHANGED
@@ -50,6 +50,13 @@ module SecID
50
50
  @check_digit = figi_parts[:check_digit]&.to_i
51
51
  end
52
52
 
53
+ # @return [String, nil]
54
+ def to_pretty_s
55
+ return nil unless valid?
56
+
57
+ "#{prefix}G #{random_part} #{check_digit}"
58
+ end
59
+
53
60
  # @return [Integer] the calculated check digit (0-9)
54
61
  # @raise [InvalidFormatError] if the FIGI format is invalid
55
62
  def calculate_check_digit
@@ -59,6 +66,9 @@ module SecID
59
66
 
60
67
  private
61
68
 
69
+ # @return [Hash]
70
+ def components = { prefix:, random_part:, check_digit: }
71
+
62
72
  # @return [Boolean]
63
73
  def valid_format?
64
74
  !identifier.nil? && !RESTRICTED_PREFIXES.include?(prefix)
data/lib/sec_id/fisn.rb CHANGED
@@ -54,5 +54,10 @@ module SecID
54
54
  def to_s
55
55
  identifier.to_s
56
56
  end
57
+
58
+ private
59
+
60
+ # @return [Hash]
61
+ def components = { issuer:, description: }
57
62
  end
58
63
  end
data/lib/sec_id/iban.rb CHANGED
@@ -33,6 +33,13 @@ module SecID
33
33
  (?<rest>[A-Z0-9]{13,32})
34
34
  \z/x
35
35
 
36
+ # Returns sorted array of all supported country codes.
37
+ #
38
+ # @return [Array<String>]
39
+ def self.supported_countries
40
+ @supported_countries ||= (COUNTRY_RULES.keys + LENGTH_ONLY_COUNTRIES.keys).sort.freeze
41
+ end
42
+
36
43
  # @return [String, nil] the ISO 3166-1 alpha-2 country code
37
44
  attr_reader :country_code
38
45
 
@@ -106,8 +113,23 @@ module SecID
106
113
  "#{country_code}#{check_digit.to_s.rjust(2, '0')}#{bban}"
107
114
  end
108
115
 
116
+ # @return [String, nil]
117
+ def to_pretty_s
118
+ to_s.scan(/.{1,4}/).join(' ') if valid?
119
+ end
120
+
109
121
  private
110
122
 
123
+ # @return [Hash]
124
+ def components
125
+ hash = { country_code:, bban:, check_digit: }
126
+ hash[:bank_code] = bank_code if bank_code
127
+ hash[:branch_code] = branch_code if branch_code
128
+ hash[:account_number] = account_number if account_number
129
+ hash[:national_check] = national_check if national_check
130
+ hash
131
+ end
132
+
111
133
  # @return [Integer]
112
134
  def check_digit_width
113
135
  2
data/lib/sec_id/isin.rb CHANGED
@@ -76,6 +76,13 @@ module SecID
76
76
  @check_digit = isin_parts[:check_digit]&.to_i
77
77
  end
78
78
 
79
+ # @return [String, nil]
80
+ def to_pretty_s
81
+ return nil unless valid?
82
+
83
+ "#{country_code} #{nsin} #{check_digit}"
84
+ end
85
+
79
86
  # @return [Integer] the calculated check digit (0-9)
80
87
  # @raise [InvalidFormatError] if the ISIN format is invalid
81
88
  def calculate_check_digit
@@ -135,6 +142,13 @@ module SecID
135
142
  Valoren.new(nsin)
136
143
  end
137
144
 
145
+ private
146
+
147
+ # @return [Hash]
148
+ def components = { country_code:, nsin:, check_digit: }
149
+
150
+ public
151
+
138
152
  # Returns the type of NSIN embedded in this ISIN.
139
153
  #
140
154
  # @return [Symbol] :cusip, :sedol, :wkn, :valoren, or :generic
data/lib/sec_id/lei.rb CHANGED
@@ -50,6 +50,13 @@ module SecID
50
50
  @check_digit = lei_parts[:check_digit]&.to_i
51
51
  end
52
52
 
53
+ # @return [String, nil]
54
+ def to_pretty_s
55
+ return nil unless valid?
56
+
57
+ to_s.scan(/.{1,4}/).join(' ')
58
+ end
59
+
53
60
  # @return [Integer] the calculated 2-digit check digit (1-98)
54
61
  # @raise [InvalidFormatError] if the LEI format is invalid
55
62
  def calculate_check_digit
@@ -59,6 +66,9 @@ module SecID
59
66
 
60
67
  private
61
68
 
69
+ # @return [Hash]
70
+ def components = { lou_id:, reserved:, entity_id:, check_digit: }
71
+
62
72
  # @return [Integer]
63
73
  def check_digit_width
64
74
  2
data/lib/sec_id/occ.rb CHANGED
@@ -138,8 +138,18 @@ module SecID
138
138
  full_id
139
139
  end
140
140
 
141
+ # @return [String, nil]
142
+ def to_pretty_s
143
+ return nil unless valid?
144
+
145
+ "#{underlying} #{date_str} #{type} #{strike_mills}"
146
+ end
147
+
141
148
  private
142
149
 
150
+ # @return [Hash]
151
+ def components = { underlying:, date_str:, type:, strike_mills: }
152
+
143
153
  # @return [Array<Symbol>]
144
154
  def error_codes
145
155
  return detect_errors unless valid_format?
@@ -0,0 +1,144 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SecID
4
+ # Immutable value object representing a matched identifier found in text.
5
+ Match = Data.define(:type, :raw, :range, :identifier)
6
+
7
+ # Finds securities identifiers in freeform text using regex candidate extraction,
8
+ # length/charset pre-filtering, and cursor-based overlap prevention.
9
+ #
10
+ # @api private
11
+ class Scanner
12
+ # Composite regex for candidate extraction.
13
+ #
14
+ # Three named groups tried left-to-right via alternation:
15
+ # - fisn: contains `/` (unique FISN delimiter)
16
+ # - occ: contains structural spaces + date/type pattern
17
+ # - simple: common alphanumeric tokens (covers all other types)
18
+ CANDIDATE_RE = %r{
19
+ (?<![A-Za-z0-9*@\#/.$])
20
+ (?:
21
+ (?<fisn>[A-Za-z0-9](?:[A-Za-z0-9 ]{0,33}[A-Za-z0-9])?/[A-Za-z0-9](?:[A-Za-z0-9 ]{0,33}[A-Za-z0-9])?)
22
+ |
23
+ (?<occ>[A-Za-z]{1,6}\ {1,5}\d{6}[CcPp]\d{8})
24
+ |
25
+ (?<simple>[A-Za-z0-9*@\#](?:[A-Za-z0-9*@\#-]{0,40}[A-Za-z0-9*@\#])?)
26
+ )
27
+ (?![A-Za-z0-9*@\#.])
28
+ }x
29
+
30
+ # @param identifier_list [Array<Class>] registered identifier classes
31
+ def initialize(identifier_list)
32
+ @classes = identifier_list.dup.freeze
33
+ precompute
34
+ end
35
+
36
+ # Scans text for identifiers, yielding or returning matches.
37
+ #
38
+ # @param text [String, nil] the text to scan
39
+ # @param classes [Array<Class>, nil] restrict to specific classes
40
+ # @return [Enumerator<Match>] if no block given
41
+ # @yieldparam match [Match]
42
+ def call(text, classes: nil, &block)
43
+ return enum_for(:call, text, classes: classes) unless block
44
+
45
+ input = text.to_s
46
+ return if input.empty?
47
+
48
+ scan_text(input, classes || @classes, &block)
49
+ end
50
+
51
+ private
52
+
53
+ # @return [void]
54
+ def precompute # rubocop:disable Metrics/AbcSize
55
+ build_key_table
56
+ build_priority_table
57
+ @fisn_classes = @classes.select { |k| k.short_name == 'FISN' }
58
+ @occ_classes = @classes.select { |k| k.short_name == 'OCC' }
59
+ @simple_classes = @classes - @fisn_classes - @occ_classes
60
+ @candidates_by_length = Hash.new { |h, k| h[k] = [] }
61
+ @classes.each do |klass|
62
+ id_length = klass::ID_LENGTH
63
+ lengths = id_length.is_a?(Range) ? id_length : [id_length]
64
+ lengths.each { |len| @candidates_by_length[len] << klass }
65
+ end
66
+ @candidates_by_length.each_value(&:freeze)
67
+ end
68
+
69
+ # @return [void]
70
+ def build_key_table
71
+ @key_for = {}
72
+ @classes.each { |klass| @key_for[klass] = klass.short_name.downcase.to_sym }
73
+ @key_for.freeze
74
+ end
75
+
76
+ # @return [void]
77
+ def build_priority_table
78
+ @priority_for = {}
79
+ @classes.each_with_index do |klass, index|
80
+ check_digit_rank = klass.has_check_digit? ? 0 : 1
81
+ id_length = klass::ID_LENGTH
82
+ range_size = id_length.is_a?(Range) ? id_length.size : 1
83
+ @priority_for[klass] = [check_digit_rank, range_size, index].freeze
84
+ end
85
+ @priority_for.freeze
86
+ end
87
+
88
+ # @param input [String]
89
+ # @param target_classes [Array<Class>]
90
+ # @return [void]
91
+ def scan_text(input, target_classes)
92
+ pos = 0
93
+ while pos < input.length
94
+ match_data = CANDIDATE_RE.match(input, pos)
95
+ break unless match_data
96
+
97
+ result = identify_candidate(match_data, target_classes)
98
+ if result
99
+ yield result
100
+ pos = match_data.end(0)
101
+ else
102
+ pos = match_data.begin(0) + 1
103
+ end
104
+ end
105
+ end
106
+
107
+ # @param match_data [MatchData]
108
+ # @param target_classes [Array<Class>]
109
+ # @return [Match, nil]
110
+ def identify_candidate(match_data, target_classes)
111
+ raw = match_data[0]
112
+ start_pos = match_data.begin(0)
113
+
114
+ if match_data[:fisn]
115
+ try_classes(raw, raw.upcase, start_pos, target_classes & @fisn_classes)
116
+ elsif match_data[:occ]
117
+ try_classes(raw, raw.upcase, start_pos, target_classes & @occ_classes)
118
+ else
119
+ cleaned = raw.gsub('-', '').upcase
120
+ try_classes(raw, cleaned, start_pos, target_classes & @simple_classes)
121
+ end
122
+ end
123
+
124
+ # @return [Match, nil]
125
+ def try_classes(raw, cleaned, start_pos, classes)
126
+ best = best_match(cleaned, classes)
127
+ return unless best
128
+
129
+ end_pos = start_pos + raw.length
130
+ Match.new(type: @key_for[best], raw: raw, range: start_pos...end_pos, identifier: best.new(cleaned))
131
+ end
132
+
133
+ # @return [Class, nil]
134
+ def best_match(cleaned, classes)
135
+ return if classes.empty?
136
+
137
+ candidates = (@candidates_by_length[cleaned.length] || []) & classes
138
+ return if candidates.empty?
139
+
140
+ validated = candidates.select { |k| cleaned.match?(k::VALID_CHARS_REGEX) && k.valid?(cleaned) }
141
+ validated.min_by { |k| @priority_for[k] }
142
+ end
143
+ end
144
+ end
data/lib/sec_id/sedol.rb CHANGED
@@ -62,6 +62,9 @@ module SecID
62
62
 
63
63
  private
64
64
 
65
+ # @return [Hash]
66
+ def components = { check_digit: }
67
+
65
68
  # NOTE: Not idiomatic Ruby, but optimized for performance.
66
69
  #
67
70
  # @return [Integer] the weighted sum
@@ -40,6 +40,13 @@ module SecID
40
40
  @identifier = valoren_parts[:identifier]
41
41
  end
42
42
 
43
+ # @return [String, nil]
44
+ def to_pretty_s
45
+ return nil unless valid?
46
+
47
+ identifier.reverse.scan(/.{1,3}/).join(' ').reverse
48
+ end
49
+
43
50
  # @param country_code [String] the ISO 3166-1 alpha-2 country code (default: 'CH')
44
51
  # @return [ISIN] a new ISIN instance with calculated check digit
45
52
  # @raise [InvalidFormatError] if the country code is not CH or LI
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SecID
4
- VERSION = '5.0.0'
4
+ VERSION = '5.2.0'
5
5
  end
data/lib/sec_id.rb CHANGED
@@ -15,6 +15,9 @@ module SecID
15
15
  # Raised for type-specific structural errors (invalid prefix, category, group, BBAN, or date).
16
16
  class InvalidStructureError < Error; end
17
17
 
18
+ # Raised when multiple identifier types match and on_ambiguous: :raise is used.
19
+ class AmbiguousMatchError < Error; end
20
+
18
21
  class << self
19
22
  # Looks up an identifier class by its symbol key.
20
23
  #
@@ -54,45 +57,85 @@ module SecID
54
57
  types.any? { |key| self[key].valid?(str) }
55
58
  end
56
59
 
57
- # Parses a string into the most specific matching identifier instance.
58
- #
60
+ # @param text [String, nil] the text to scan
61
+ # @param types [Array<Symbol>, nil] restrict to specific types
62
+ # @return [Array<Match>]
63
+ # @raise [ArgumentError] if any key in types is unknown
64
+ def extract(text, types: nil)
65
+ scan(text, types: types).to_a
66
+ end
67
+
68
+ # @param text [String, nil] the text to scan
69
+ # @param types [Array<Symbol>, nil] restrict to specific types
70
+ # @return [Enumerator<Match>] if no block given
71
+ # @yieldparam match [Match]
72
+ # @raise [ArgumentError] if any key in types is unknown
73
+ def scan(text, types: nil, &)
74
+ classes = types&.map { |key| self[key] }
75
+ scanner.call(text, classes: classes, &)
76
+ end
77
+
78
+ # @param str [String, nil] the identifier string to explain
79
+ # @param types [Array<Symbol>, nil] restrict to specific types
80
+ # @return [Hash] hash with :input and :candidates keys
81
+ def explain(str, types: nil)
82
+ input = str.to_s.strip
83
+ target_keys = types || identifier_list.map { |k| k.short_name.downcase.to_sym }
84
+ candidates = target_keys.map do |key|
85
+ instance = self[key].new(input)
86
+ { type: key, valid: instance.valid?, errors: instance.errors.details }
87
+ end
88
+ { input: input, candidates: candidates }
89
+ end
90
+
59
91
  # @param str [String, nil] the identifier string to parse
60
92
  # @param types [Array<Symbol>, nil] restrict to specific types (e.g. [:isin, :cusip])
61
- # @return [SecID::Base, nil] a valid identifier instance, or nil if no match
62
- # @raise [ArgumentError] if any key in types is unknown
63
- def parse(str, types: nil)
64
- types.nil? ? parse_any(str) : parse_from(str, types)
93
+ # @param on_ambiguous [:first, :raise, :all] how to handle multiple matches
94
+ # @return [SecID::Base, nil, Array<SecID::Base>] depends on on_ambiguous mode
95
+ # @raise [AmbiguousMatchError] when on_ambiguous: :raise and multiple types match
96
+ def parse(str, types: nil, on_ambiguous: :first)
97
+ case on_ambiguous
98
+ when :first then types.nil? ? parse_any(str) : parse_from(str, types)
99
+ when :raise then parse_strict(str, types)
100
+ when :all then parse_all(str, types)
101
+ else raise ArgumentError, "Unknown on_ambiguous mode: #{on_ambiguous.inspect}"
102
+ end
65
103
  end
66
104
 
67
- # Parses a string into the most specific matching identifier instance, raising on failure.
68
- #
69
105
  # @param str [String, nil] the identifier string to parse
70
106
  # @param types [Array<Symbol>, nil] restrict to specific types (e.g. [:isin, :cusip])
71
- # @return [SecID::Base] a valid identifier instance
107
+ # @param on_ambiguous [:first, :raise, :all] how to handle multiple matches
108
+ # @return [SecID::Base, Array<SecID::Base>] depends on on_ambiguous mode
72
109
  # @raise [InvalidFormatError] if no matching identifier type is found
73
- # @raise [ArgumentError] if any key in types is unknown
74
- def parse!(str, types: nil)
75
- parse(str, types: types) || raise(InvalidFormatError, parse_error_message(str, types))
110
+ # @raise [AmbiguousMatchError] when on_ambiguous: :raise and multiple types match
111
+ def parse!(str, types: nil, on_ambiguous: :first)
112
+ result = parse(str, types: types, on_ambiguous: on_ambiguous)
113
+
114
+ if on_ambiguous == :all
115
+ raise(InvalidFormatError, parse_error_message(str, types)) if result.empty?
116
+
117
+ return result
118
+ end
119
+
120
+ result || raise(InvalidFormatError, parse_error_message(str, types))
76
121
  end
77
122
 
78
123
  private
79
124
 
80
- # @param klass [Class] the identifier class to register
81
125
  # @return [void]
82
126
  def register_identifier(klass)
83
127
  key = klass.name.split('::').last.downcase.to_sym
84
128
  identifier_map[key] = klass
85
129
  identifier_list << klass
86
130
  @detector = nil
131
+ @scanner = nil
87
132
  end
88
133
 
89
- # @return [SecID::Base, nil]
90
134
  def parse_any(str)
91
135
  key = detect(str).first
92
136
  key && self[key].new(str)
93
137
  end
94
138
 
95
- # @return [SecID::Base, nil]
96
139
  def parse_from(str, types)
97
140
  types.each do |key|
98
141
  instance = self[key].new(str)
@@ -101,26 +144,37 @@ module SecID
101
144
  nil
102
145
  end
103
146
 
104
- # @return [String]
105
- def parse_error_message(str, types)
106
- base = "No matching identifier type found for #{str.to_s.strip.inspect}"
107
- types ? "#{base} among #{types.inspect}" : base
147
+ def parse_strict(str, types)
148
+ candidates = resolve_candidates(str, types)
149
+ raise AmbiguousMatchError, ambiguous_message(str, candidates) if candidates.size > 1
150
+
151
+ candidates.first && self[candidates.first].new(str)
108
152
  end
109
153
 
110
- # @return [Detector]
111
- def detector
112
- @detector ||= Detector.new(identifier_list)
154
+ def parse_all(str, types)
155
+ resolve_candidates(str, types).map { |key| self[key].new(str) }
113
156
  end
114
157
 
115
- # @return [Hash{Symbol => Class}]
116
- def identifier_map
117
- @identifier_map ||= {}
158
+ # @return [Array<Symbol>]
159
+ def resolve_candidates(str, types)
160
+ types ? types.select { |key| self[key].valid?(str) } : detect(str)
118
161
  end
119
162
 
120
- # @return [Array<Class>]
121
- def identifier_list
122
- @identifier_list ||= []
163
+ # @return [String]
164
+ def ambiguous_message(str, candidates)
165
+ "Ambiguous identifier #{str.to_s.strip.inspect}: matches #{candidates.inspect}"
166
+ end
167
+
168
+ # @return [String]
169
+ def parse_error_message(str, types)
170
+ base = "No matching identifier type found for #{str.to_s.strip.inspect}"
171
+ types ? "#{base} among #{types.inspect}" : base
123
172
  end
173
+
174
+ def detector = @detector ||= Detector.new(identifier_list)
175
+ def scanner = @scanner ||= Scanner.new(identifier_list)
176
+ def identifier_map = @identifier_map ||= {}
177
+ def identifier_list = @identifier_list ||= []
124
178
  end
125
179
  end
126
180
 
@@ -131,6 +185,7 @@ require 'sec_id/concerns/validatable'
131
185
  require 'sec_id/concerns/checkable'
132
186
  require 'sec_id/base'
133
187
  require 'sec_id/detector'
188
+ require 'sec_id/scanner'
134
189
  require 'sec_id/isin'
135
190
  require 'sec_id/cusip'
136
191
  require 'sec_id/sedol'
data/sec_id.gemspec CHANGED
@@ -10,10 +10,10 @@ Gem::Specification.new do |spec|
10
10
  spec.authors = ['Leonid Svyatov']
11
11
  spec.email = ['leonid@svyatov.ru']
12
12
 
13
- spec.summary = 'Validate securities identification numbers with ease!'
14
- spec.description = 'Validate, calculate check digits, and parse components of securities identifiers. ' \
15
- 'Supports ISIN, CUSIP, CEI, SEDOL, FIGI, LEI, IBAN, CIK, OCC, WKN, Valoren, CFI, ' \
16
- 'and FISN standards.'
13
+ spec.summary = 'A Ruby toolkit for securities identifiers validate, parse, normalize, detect, and convert.'
14
+ spec.description = 'Validate, normalize, parse, and convert securities identifiers. Auto-detect identifier ' \
15
+ 'type from any string. Calculate and restore check digits. Supports ISIN, CUSIP, CEI, ' \
16
+ 'SEDOL, FIGI, LEI, IBAN, CIK, OCC, WKN, Valoren, CFI, and FISN.'
17
17
  spec.homepage = 'https://github.com/svyatov/sec_id'
18
18
  spec.license = 'MIT'
19
19
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sec_id
3
3
  version: !ruby/object:Gem::Version
4
- version: 5.0.0
4
+ version: 5.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Leonid Svyatov
@@ -9,9 +9,9 @@ bindir: bin
9
9
  cert_chain: []
10
10
  date: 1980-01-02 00:00:00.000000000 Z
11
11
  dependencies: []
12
- description: Validate, calculate check digits, and parse components of securities
13
- identifiers. Supports ISIN, CUSIP, CEI, SEDOL, FIGI, LEI, IBAN, CIK, OCC, WKN, Valoren,
14
- CFI, and FISN standards.
12
+ description: Validate, normalize, parse, and convert securities identifiers. Auto-detect
13
+ identifier type from any string. Calculate and restore check digits. Supports ISIN,
14
+ CUSIP, CEI, SEDOL, FIGI, LEI, IBAN, CIK, OCC, WKN, Valoren, CFI, and FISN.
15
15
  email:
16
16
  - leonid@svyatov.ru
17
17
  executables: []
@@ -41,6 +41,7 @@ files:
41
41
  - lib/sec_id/isin.rb
42
42
  - lib/sec_id/lei.rb
43
43
  - lib/sec_id/occ.rb
44
+ - lib/sec_id/scanner.rb
44
45
  - lib/sec_id/sedol.rb
45
46
  - lib/sec_id/valoren.rb
46
47
  - lib/sec_id/version.rb
@@ -70,5 +71,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
70
71
  requirements: []
71
72
  rubygems_version: 4.0.6
72
73
  specification_version: 4
73
- summary: Validate securities identification numbers with ease!
74
+ summary: A Ruby toolkit for securities identifiers validate, parse, normalize, detect,
75
+ and convert.
74
76
  test_files: []