sec_id 5.1.0 → 5.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/README.md +66 -2
- data/lib/sec_id/base.rb +8 -0
- data/lib/sec_id/cfi.rb +16 -1
- data/lib/sec_id/errors.rb +7 -0
- data/lib/sec_id/iban.rb +7 -0
- data/lib/sec_id/scanner.rb +144 -0
- data/lib/sec_id/version.rb +1 -1
- data/lib/sec_id.rb +83 -28
- data/sec_id.gemspec +4 -4
- metadata +7 -5
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: bc458a32d8a5cb8fc4db2b1ab4c635a6c88e4160d38948e4ce779e90b039bd3c
|
|
4
|
+
data.tar.gz: 298298a51ca424aef20c814ed620e35d32bbfe9bf285fe1f6ee6f44bac163980
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 78650675337ce8a03970e4b1227713aaa0adafb8c18da35466099add8fc39389c8b02583602c18f9aeb59a340ee6b9db9f42ef7662d96284dd0ed93336a50a6a
|
|
7
|
+
data.tar.gz: 27d54f707d2ec471218a15e6f9dadc2d78d3f46985084f1e18be921b83691f4fe5cd4d82d793ae9cbb34c3e55df30e2966aa834b5942606dfa81d5e3d38e4799
|
data/CHANGELOG.md
CHANGED
|
@@ -8,6 +8,20 @@ and [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/).
|
|
|
8
8
|
|
|
9
9
|
## [Unreleased]
|
|
10
10
|
|
|
11
|
+
## [5.2.0] - 2026-02-24
|
|
12
|
+
|
|
13
|
+
### Added
|
|
14
|
+
|
|
15
|
+
- `SecID.scan` and `SecID.extract` methods for finding identifiers in freeform text — returns `Scanner::Match` objects (`Data.define(:type, :raw, :range, :identifier)`) with the validated identifier instance; supports `types:` filtering, hyphenated identifiers, and compound patterns (OCC with spaces, FISN with slashes)
|
|
16
|
+
- `SecID.explain` method for debugging identifier detection — returns per-type validation results showing exactly why each type matched or rejected the input
|
|
17
|
+
- `on_ambiguous:` option for `SecID.parse` and `SecID.parse!` — `:first` (default, existing behavior), `:raise` (raises `AmbiguousMatchError`), `:all` (returns array of all matching instances)
|
|
18
|
+
- `SecID::AmbiguousMatchError` exception class for ambiguous identifier detection
|
|
19
|
+
- `#as_json` method on all identifier types (delegates to `#to_h`) and on `Errors` (delegates to `#details`) for JSON serialization compatibility
|
|
20
|
+
- `SecID::IBAN.supported_countries` class method returning sorted array of all supported country codes
|
|
21
|
+
- `SecID::CFI.categories` class method returning the categories hash
|
|
22
|
+
- `SecID::CFI.groups_for(category_code)` class method returning groups hash for a given category
|
|
23
|
+
|
|
24
|
+
|
|
11
25
|
## [5.1.0] - 2026-02-19
|
|
12
26
|
|
|
13
27
|
### Added
|
data/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# SecID [](https://rubygems.org/gems/sec_id) [](https://app.codecov.io/gh/svyatov/sec_id) [](https://github.com/svyatov/sec_id/actions?query=workflow%3ACI)
|
|
2
2
|
|
|
3
|
-
>
|
|
3
|
+
> A Ruby toolkit for securities identifiers — validate, parse, normalize, detect, and convert.
|
|
4
4
|
|
|
5
5
|
## Table of Contents
|
|
6
6
|
|
|
@@ -8,6 +8,8 @@
|
|
|
8
8
|
- [Installation](#installation)
|
|
9
9
|
- [Supported Standards and Usage](#supported-standards-and-usage)
|
|
10
10
|
- [Metadata Registry](#metadata-registry) - enumerate, filter, look up, and detect identifier types
|
|
11
|
+
- [Text Scanning](#text-scanning) - find identifiers in freeform text
|
|
12
|
+
- [Debugging Detection](#debugging-detection) - understand why strings match or don't
|
|
11
13
|
- [Structured Validation](#structured-validation) - detailed error codes and messages
|
|
12
14
|
- [ISIN](#isin) - International Securities Identification Number
|
|
13
15
|
- [CUSIP](#cusip) - Committee on Uniform Securities Identification Procedures
|
|
@@ -38,7 +40,7 @@ Ruby 3.2+ is required.
|
|
|
38
40
|
Add this line to your application's Gemfile:
|
|
39
41
|
|
|
40
42
|
```ruby
|
|
41
|
-
gem 'sec_id', '~> 5.
|
|
43
|
+
gem 'sec_id', '~> 5.2'
|
|
42
44
|
```
|
|
43
45
|
|
|
44
46
|
And then execute:
|
|
@@ -67,6 +69,7 @@ All identifier classes provide `valid?`, `errors`, `validate`, `validate!` metho
|
|
|
67
69
|
|
|
68
70
|
**All identifiers** support hash serialization:
|
|
69
71
|
- `#to_h` - returns a hash with `:type`, `:full_id`, `:normalized`, `:valid`, and `:components` keys
|
|
72
|
+
- `#as_json` - same as `#to_h`, for JSON serialization compatibility (Rails, `JSON.generate`, etc.)
|
|
70
73
|
|
|
71
74
|
```ruby
|
|
72
75
|
SecID::ISIN.new('US5949181045').to_h
|
|
@@ -144,6 +147,56 @@ SecID.parse('594918104', types: [:cusip]) # => #<SecID::CUSIP>
|
|
|
144
147
|
# Bang version raises on failure
|
|
145
148
|
SecID.parse!('US5949181045') # => #<SecID::ISIN>
|
|
146
149
|
SecID.parse!('unknown') # raises SecID::InvalidFormatError
|
|
150
|
+
|
|
151
|
+
# Handle ambiguous matches
|
|
152
|
+
SecID.parse('514000', on_ambiguous: :first) # => #<SecID::WKN> (default)
|
|
153
|
+
SecID.parse('514000', on_ambiguous: :raise) # raises SecID::AmbiguousMatchError
|
|
154
|
+
SecID.parse('514000', on_ambiguous: :all) # => [#<SecID::WKN>, #<SecID::Valoren>, #<SecID::CIK>]
|
|
155
|
+
SecID.parse('US5949181045', on_ambiguous: :raise) # => #<SecID::ISIN> (unambiguous, no error)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Text Scanning
|
|
159
|
+
|
|
160
|
+
Find identifiers embedded in freeform text:
|
|
161
|
+
|
|
162
|
+
```ruby
|
|
163
|
+
# Extract all identifiers from text
|
|
164
|
+
matches = SecID.extract('Portfolio: US5949181045, 594918104, B0YBKJ7')
|
|
165
|
+
matches.map(&:type) # => [:isin, :cusip, :sedol]
|
|
166
|
+
matches.first.raw # => "US5949181045"
|
|
167
|
+
matches.first.range # => 11...23
|
|
168
|
+
matches.first.identifier.country_code # => "US"
|
|
169
|
+
|
|
170
|
+
# Lazy scanning with Enumerator
|
|
171
|
+
SecID.scan('Buy US5949181045 now').each { |m| puts m.type }
|
|
172
|
+
|
|
173
|
+
# Filter by types
|
|
174
|
+
SecID.extract('514000', types: [:valoren]) # => only Valoren matches
|
|
175
|
+
|
|
176
|
+
# Handles hyphenated identifiers
|
|
177
|
+
match = SecID.extract('ID: US-5949-1810-45').first
|
|
178
|
+
match.raw # => "US-5949-1810-45"
|
|
179
|
+
match.identifier.normalized # => "US5949181045"
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
> **Known limitations:** Format-only types (CIK, Valoren, WKN, CFI) can false-positive on
|
|
183
|
+
> common numbers and short words in prose — use the `types:` filter to restrict scanning when
|
|
184
|
+
> this is a concern. Identifiers prefixed with special characters (e.g. `#US5949181045`) may be
|
|
185
|
+
> consumed as a single token by CUSIP's `*@#` character class and fail validation, preventing
|
|
186
|
+
> the embedded identifier from being found.
|
|
187
|
+
|
|
188
|
+
### Debugging Detection
|
|
189
|
+
|
|
190
|
+
Understand why a string matches or doesn't match specific identifier types:
|
|
191
|
+
|
|
192
|
+
```ruby
|
|
193
|
+
result = SecID.explain('US5949181040')
|
|
194
|
+
isin = result[:candidates].find { |c| c[:type] == :isin }
|
|
195
|
+
isin[:valid] # => false
|
|
196
|
+
isin[:errors].first[:error] # => :invalid_check_digit
|
|
197
|
+
|
|
198
|
+
# Filter to specific types
|
|
199
|
+
SecID.explain('US5949181045', types: %i[isin cusip])
|
|
147
200
|
```
|
|
148
201
|
|
|
149
202
|
### Structured Validation
|
|
@@ -396,6 +449,11 @@ iban.to_pretty_s # => 'DE89 3704 0044 0532 0130 00'
|
|
|
396
449
|
|
|
397
450
|
Full BBAN structural validation is supported for EU/EEA countries. Other countries have length-only validation.
|
|
398
451
|
|
|
452
|
+
```ruby
|
|
453
|
+
# List all supported countries
|
|
454
|
+
SecID::IBAN.supported_countries # => ["AD", "AE", "AT", "BE", "BG", "CH", ...]
|
|
455
|
+
```
|
|
456
|
+
|
|
399
457
|
### CIK
|
|
400
458
|
|
|
401
459
|
> [Central Index Key](https://en.wikipedia.org/wiki/Central_Index_Key) - a 10-digit number used by the SEC to identify corporations and individuals who have filed disclosures.
|
|
@@ -522,6 +580,12 @@ cfi.registered? # => true
|
|
|
522
580
|
|
|
523
581
|
CFI validates the category code (position 1) against 14 valid values and the group code (position 2) against valid values for that category. Attribute positions 3-6 accept any letter A-Z, with X meaning "not applicable".
|
|
524
582
|
|
|
583
|
+
```ruby
|
|
584
|
+
# Introspect valid codes
|
|
585
|
+
SecID::CFI.categories # => { "E" => :equity, "C" => :collective_investment_vehicles, ... }
|
|
586
|
+
SecID::CFI.groups_for('E') # => { "S" => :common_shares, "P" => :preferred_shares, ... }
|
|
587
|
+
```
|
|
588
|
+
|
|
525
589
|
### FISN
|
|
526
590
|
|
|
527
591
|
> [Financial Instrument Short Name](https://en.wikipedia.org/wiki/ISO_18774) - a human-readable short name for financial instruments per ISO 18774.
|
data/lib/sec_id/base.rb
CHANGED
|
@@ -52,6 +52,7 @@ module SecID
|
|
|
52
52
|
# @api private
|
|
53
53
|
def self.inherited(subclass)
|
|
54
54
|
super
|
|
55
|
+
# Skip anonymous classes and classes outside the SecID namespace (e.g. in tests)
|
|
55
56
|
SecID.__send__(:register_identifier, subclass) if subclass.name&.start_with?('SecID::')
|
|
56
57
|
end
|
|
57
58
|
|
|
@@ -89,6 +90,13 @@ module SecID
|
|
|
89
90
|
}
|
|
90
91
|
end
|
|
91
92
|
|
|
93
|
+
# Returns a JSON-compatible hash representation.
|
|
94
|
+
#
|
|
95
|
+
# @return [Hash]
|
|
96
|
+
def as_json(*)
|
|
97
|
+
to_h
|
|
98
|
+
end
|
|
99
|
+
|
|
92
100
|
protected
|
|
93
101
|
|
|
94
102
|
# @return [String]
|
data/lib/sec_id/cfi.rb
CHANGED
|
@@ -164,7 +164,22 @@ module SecID
|
|
|
164
164
|
'C' => :combined_instruments,
|
|
165
165
|
'M' => :miscellaneous
|
|
166
166
|
}
|
|
167
|
-
}.freeze
|
|
167
|
+
}.each_value(&:freeze).freeze
|
|
168
|
+
|
|
169
|
+
# Returns the category codes hash.
|
|
170
|
+
#
|
|
171
|
+
# @return [Hash{String => Symbol}]
|
|
172
|
+
def self.categories
|
|
173
|
+
CATEGORIES
|
|
174
|
+
end
|
|
175
|
+
|
|
176
|
+
# Returns the groups hash for a given category code.
|
|
177
|
+
#
|
|
178
|
+
# @param category_code [String] single-letter category code
|
|
179
|
+
# @return [Hash{String => Symbol}, nil]
|
|
180
|
+
def self.groups_for(category_code)
|
|
181
|
+
GROUPS[category_code.to_s.upcase]
|
|
182
|
+
end
|
|
168
183
|
|
|
169
184
|
# @return [String, nil] the category code (position 1)
|
|
170
185
|
attr_reader :category_code
|
data/lib/sec_id/errors.rb
CHANGED
data/lib/sec_id/iban.rb
CHANGED
|
@@ -33,6 +33,13 @@ module SecID
|
|
|
33
33
|
(?<rest>[A-Z0-9]{13,32})
|
|
34
34
|
\z/x
|
|
35
35
|
|
|
36
|
+
# Returns sorted array of all supported country codes.
|
|
37
|
+
#
|
|
38
|
+
# @return [Array<String>]
|
|
39
|
+
def self.supported_countries
|
|
40
|
+
@supported_countries ||= (COUNTRY_RULES.keys + LENGTH_ONLY_COUNTRIES.keys).sort.freeze
|
|
41
|
+
end
|
|
42
|
+
|
|
36
43
|
# @return [String, nil] the ISO 3166-1 alpha-2 country code
|
|
37
44
|
attr_reader :country_code
|
|
38
45
|
|
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module SecID
|
|
4
|
+
# Immutable value object representing a matched identifier found in text.
|
|
5
|
+
Match = Data.define(:type, :raw, :range, :identifier)
|
|
6
|
+
|
|
7
|
+
# Finds securities identifiers in freeform text using regex candidate extraction,
|
|
8
|
+
# length/charset pre-filtering, and cursor-based overlap prevention.
|
|
9
|
+
#
|
|
10
|
+
# @api private
|
|
11
|
+
class Scanner
|
|
12
|
+
# Composite regex for candidate extraction.
|
|
13
|
+
#
|
|
14
|
+
# Three named groups tried left-to-right via alternation:
|
|
15
|
+
# - fisn: contains `/` (unique FISN delimiter)
|
|
16
|
+
# - occ: contains structural spaces + date/type pattern
|
|
17
|
+
# - simple: common alphanumeric tokens (covers all other types)
|
|
18
|
+
CANDIDATE_RE = %r{
|
|
19
|
+
(?<![A-Za-z0-9*@\#/.$])
|
|
20
|
+
(?:
|
|
21
|
+
(?<fisn>[A-Za-z0-9](?:[A-Za-z0-9 ]{0,33}[A-Za-z0-9])?/[A-Za-z0-9](?:[A-Za-z0-9 ]{0,33}[A-Za-z0-9])?)
|
|
22
|
+
|
|
|
23
|
+
(?<occ>[A-Za-z]{1,6}\ {1,5}\d{6}[CcPp]\d{8})
|
|
24
|
+
|
|
|
25
|
+
(?<simple>[A-Za-z0-9*@\#](?:[A-Za-z0-9*@\#-]{0,40}[A-Za-z0-9*@\#])?)
|
|
26
|
+
)
|
|
27
|
+
(?![A-Za-z0-9*@\#.])
|
|
28
|
+
}x
|
|
29
|
+
|
|
30
|
+
# @param identifier_list [Array<Class>] registered identifier classes
|
|
31
|
+
def initialize(identifier_list)
|
|
32
|
+
@classes = identifier_list.dup.freeze
|
|
33
|
+
precompute
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
# Scans text for identifiers, yielding or returning matches.
|
|
37
|
+
#
|
|
38
|
+
# @param text [String, nil] the text to scan
|
|
39
|
+
# @param classes [Array<Class>, nil] restrict to specific classes
|
|
40
|
+
# @return [Enumerator<Match>] if no block given
|
|
41
|
+
# @yieldparam match [Match]
|
|
42
|
+
def call(text, classes: nil, &block)
|
|
43
|
+
return enum_for(:call, text, classes: classes) unless block
|
|
44
|
+
|
|
45
|
+
input = text.to_s
|
|
46
|
+
return if input.empty?
|
|
47
|
+
|
|
48
|
+
scan_text(input, classes || @classes, &block)
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
private
|
|
52
|
+
|
|
53
|
+
# @return [void]
|
|
54
|
+
def precompute # rubocop:disable Metrics/AbcSize
|
|
55
|
+
build_key_table
|
|
56
|
+
build_priority_table
|
|
57
|
+
@fisn_classes = @classes.select { |k| k.short_name == 'FISN' }
|
|
58
|
+
@occ_classes = @classes.select { |k| k.short_name == 'OCC' }
|
|
59
|
+
@simple_classes = @classes - @fisn_classes - @occ_classes
|
|
60
|
+
@candidates_by_length = Hash.new { |h, k| h[k] = [] }
|
|
61
|
+
@classes.each do |klass|
|
|
62
|
+
id_length = klass::ID_LENGTH
|
|
63
|
+
lengths = id_length.is_a?(Range) ? id_length : [id_length]
|
|
64
|
+
lengths.each { |len| @candidates_by_length[len] << klass }
|
|
65
|
+
end
|
|
66
|
+
@candidates_by_length.each_value(&:freeze)
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
# @return [void]
|
|
70
|
+
def build_key_table
|
|
71
|
+
@key_for = {}
|
|
72
|
+
@classes.each { |klass| @key_for[klass] = klass.short_name.downcase.to_sym }
|
|
73
|
+
@key_for.freeze
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
# @return [void]
|
|
77
|
+
def build_priority_table
|
|
78
|
+
@priority_for = {}
|
|
79
|
+
@classes.each_with_index do |klass, index|
|
|
80
|
+
check_digit_rank = klass.has_check_digit? ? 0 : 1
|
|
81
|
+
id_length = klass::ID_LENGTH
|
|
82
|
+
range_size = id_length.is_a?(Range) ? id_length.size : 1
|
|
83
|
+
@priority_for[klass] = [check_digit_rank, range_size, index].freeze
|
|
84
|
+
end
|
|
85
|
+
@priority_for.freeze
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
# @param input [String]
|
|
89
|
+
# @param target_classes [Array<Class>]
|
|
90
|
+
# @return [void]
|
|
91
|
+
def scan_text(input, target_classes)
|
|
92
|
+
pos = 0
|
|
93
|
+
while pos < input.length
|
|
94
|
+
match_data = CANDIDATE_RE.match(input, pos)
|
|
95
|
+
break unless match_data
|
|
96
|
+
|
|
97
|
+
result = identify_candidate(match_data, target_classes)
|
|
98
|
+
if result
|
|
99
|
+
yield result
|
|
100
|
+
pos = match_data.end(0)
|
|
101
|
+
else
|
|
102
|
+
pos = match_data.begin(0) + 1
|
|
103
|
+
end
|
|
104
|
+
end
|
|
105
|
+
end
|
|
106
|
+
|
|
107
|
+
# @param match_data [MatchData]
|
|
108
|
+
# @param target_classes [Array<Class>]
|
|
109
|
+
# @return [Match, nil]
|
|
110
|
+
def identify_candidate(match_data, target_classes)
|
|
111
|
+
raw = match_data[0]
|
|
112
|
+
start_pos = match_data.begin(0)
|
|
113
|
+
|
|
114
|
+
if match_data[:fisn]
|
|
115
|
+
try_classes(raw, raw.upcase, start_pos, target_classes & @fisn_classes)
|
|
116
|
+
elsif match_data[:occ]
|
|
117
|
+
try_classes(raw, raw.upcase, start_pos, target_classes & @occ_classes)
|
|
118
|
+
else
|
|
119
|
+
cleaned = raw.gsub('-', '').upcase
|
|
120
|
+
try_classes(raw, cleaned, start_pos, target_classes & @simple_classes)
|
|
121
|
+
end
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
# @return [Match, nil]
|
|
125
|
+
def try_classes(raw, cleaned, start_pos, classes)
|
|
126
|
+
best = best_match(cleaned, classes)
|
|
127
|
+
return unless best
|
|
128
|
+
|
|
129
|
+
end_pos = start_pos + raw.length
|
|
130
|
+
Match.new(type: @key_for[best], raw: raw, range: start_pos...end_pos, identifier: best.new(cleaned))
|
|
131
|
+
end
|
|
132
|
+
|
|
133
|
+
# @return [Class, nil]
|
|
134
|
+
def best_match(cleaned, classes)
|
|
135
|
+
return if classes.empty?
|
|
136
|
+
|
|
137
|
+
candidates = (@candidates_by_length[cleaned.length] || []) & classes
|
|
138
|
+
return if candidates.empty?
|
|
139
|
+
|
|
140
|
+
validated = candidates.select { |k| cleaned.match?(k::VALID_CHARS_REGEX) && k.valid?(cleaned) }
|
|
141
|
+
validated.min_by { |k| @priority_for[k] }
|
|
142
|
+
end
|
|
143
|
+
end
|
|
144
|
+
end
|
data/lib/sec_id/version.rb
CHANGED
data/lib/sec_id.rb
CHANGED
|
@@ -15,6 +15,9 @@ module SecID
|
|
|
15
15
|
# Raised for type-specific structural errors (invalid prefix, category, group, BBAN, or date).
|
|
16
16
|
class InvalidStructureError < Error; end
|
|
17
17
|
|
|
18
|
+
# Raised when multiple identifier types match and on_ambiguous: :raise is used.
|
|
19
|
+
class AmbiguousMatchError < Error; end
|
|
20
|
+
|
|
18
21
|
class << self
|
|
19
22
|
# Looks up an identifier class by its symbol key.
|
|
20
23
|
#
|
|
@@ -54,45 +57,85 @@ module SecID
|
|
|
54
57
|
types.any? { |key| self[key].valid?(str) }
|
|
55
58
|
end
|
|
56
59
|
|
|
57
|
-
#
|
|
58
|
-
#
|
|
60
|
+
# @param text [String, nil] the text to scan
|
|
61
|
+
# @param types [Array<Symbol>, nil] restrict to specific types
|
|
62
|
+
# @return [Array<Match>]
|
|
63
|
+
# @raise [ArgumentError] if any key in types is unknown
|
|
64
|
+
def extract(text, types: nil)
|
|
65
|
+
scan(text, types: types).to_a
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
# @param text [String, nil] the text to scan
|
|
69
|
+
# @param types [Array<Symbol>, nil] restrict to specific types
|
|
70
|
+
# @return [Enumerator<Match>] if no block given
|
|
71
|
+
# @yieldparam match [Match]
|
|
72
|
+
# @raise [ArgumentError] if any key in types is unknown
|
|
73
|
+
def scan(text, types: nil, &)
|
|
74
|
+
classes = types&.map { |key| self[key] }
|
|
75
|
+
scanner.call(text, classes: classes, &)
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
# @param str [String, nil] the identifier string to explain
|
|
79
|
+
# @param types [Array<Symbol>, nil] restrict to specific types
|
|
80
|
+
# @return [Hash] hash with :input and :candidates keys
|
|
81
|
+
def explain(str, types: nil)
|
|
82
|
+
input = str.to_s.strip
|
|
83
|
+
target_keys = types || identifier_list.map { |k| k.short_name.downcase.to_sym }
|
|
84
|
+
candidates = target_keys.map do |key|
|
|
85
|
+
instance = self[key].new(input)
|
|
86
|
+
{ type: key, valid: instance.valid?, errors: instance.errors.details }
|
|
87
|
+
end
|
|
88
|
+
{ input: input, candidates: candidates }
|
|
89
|
+
end
|
|
90
|
+
|
|
59
91
|
# @param str [String, nil] the identifier string to parse
|
|
60
92
|
# @param types [Array<Symbol>, nil] restrict to specific types (e.g. [:isin, :cusip])
|
|
61
|
-
# @
|
|
62
|
-
# @
|
|
63
|
-
|
|
64
|
-
|
|
93
|
+
# @param on_ambiguous [:first, :raise, :all] how to handle multiple matches
|
|
94
|
+
# @return [SecID::Base, nil, Array<SecID::Base>] depends on on_ambiguous mode
|
|
95
|
+
# @raise [AmbiguousMatchError] when on_ambiguous: :raise and multiple types match
|
|
96
|
+
def parse(str, types: nil, on_ambiguous: :first)
|
|
97
|
+
case on_ambiguous
|
|
98
|
+
when :first then types.nil? ? parse_any(str) : parse_from(str, types)
|
|
99
|
+
when :raise then parse_strict(str, types)
|
|
100
|
+
when :all then parse_all(str, types)
|
|
101
|
+
else raise ArgumentError, "Unknown on_ambiguous mode: #{on_ambiguous.inspect}"
|
|
102
|
+
end
|
|
65
103
|
end
|
|
66
104
|
|
|
67
|
-
# Parses a string into the most specific matching identifier instance, raising on failure.
|
|
68
|
-
#
|
|
69
105
|
# @param str [String, nil] the identifier string to parse
|
|
70
106
|
# @param types [Array<Symbol>, nil] restrict to specific types (e.g. [:isin, :cusip])
|
|
71
|
-
# @
|
|
107
|
+
# @param on_ambiguous [:first, :raise, :all] how to handle multiple matches
|
|
108
|
+
# @return [SecID::Base, Array<SecID::Base>] depends on on_ambiguous mode
|
|
72
109
|
# @raise [InvalidFormatError] if no matching identifier type is found
|
|
73
|
-
# @raise [
|
|
74
|
-
def parse!(str, types: nil)
|
|
75
|
-
parse(str, types: types
|
|
110
|
+
# @raise [AmbiguousMatchError] when on_ambiguous: :raise and multiple types match
|
|
111
|
+
def parse!(str, types: nil, on_ambiguous: :first)
|
|
112
|
+
result = parse(str, types: types, on_ambiguous: on_ambiguous)
|
|
113
|
+
|
|
114
|
+
if on_ambiguous == :all
|
|
115
|
+
raise(InvalidFormatError, parse_error_message(str, types)) if result.empty?
|
|
116
|
+
|
|
117
|
+
return result
|
|
118
|
+
end
|
|
119
|
+
|
|
120
|
+
result || raise(InvalidFormatError, parse_error_message(str, types))
|
|
76
121
|
end
|
|
77
122
|
|
|
78
123
|
private
|
|
79
124
|
|
|
80
|
-
# @param klass [Class] the identifier class to register
|
|
81
125
|
# @return [void]
|
|
82
126
|
def register_identifier(klass)
|
|
83
127
|
key = klass.name.split('::').last.downcase.to_sym
|
|
84
128
|
identifier_map[key] = klass
|
|
85
129
|
identifier_list << klass
|
|
86
130
|
@detector = nil
|
|
131
|
+
@scanner = nil
|
|
87
132
|
end
|
|
88
133
|
|
|
89
|
-
# @return [SecID::Base, nil]
|
|
90
134
|
def parse_any(str)
|
|
91
135
|
key = detect(str).first
|
|
92
136
|
key && self[key].new(str)
|
|
93
137
|
end
|
|
94
138
|
|
|
95
|
-
# @return [SecID::Base, nil]
|
|
96
139
|
def parse_from(str, types)
|
|
97
140
|
types.each do |key|
|
|
98
141
|
instance = self[key].new(str)
|
|
@@ -101,26 +144,37 @@ module SecID
|
|
|
101
144
|
nil
|
|
102
145
|
end
|
|
103
146
|
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
147
|
+
def parse_strict(str, types)
|
|
148
|
+
candidates = resolve_candidates(str, types)
|
|
149
|
+
raise AmbiguousMatchError, ambiguous_message(str, candidates) if candidates.size > 1
|
|
150
|
+
|
|
151
|
+
candidates.first && self[candidates.first].new(str)
|
|
108
152
|
end
|
|
109
153
|
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
@detector ||= Detector.new(identifier_list)
|
|
154
|
+
def parse_all(str, types)
|
|
155
|
+
resolve_candidates(str, types).map { |key| self[key].new(str) }
|
|
113
156
|
end
|
|
114
157
|
|
|
115
|
-
# @return [
|
|
116
|
-
def
|
|
117
|
-
|
|
158
|
+
# @return [Array<Symbol>]
|
|
159
|
+
def resolve_candidates(str, types)
|
|
160
|
+
types ? types.select { |key| self[key].valid?(str) } : detect(str)
|
|
118
161
|
end
|
|
119
162
|
|
|
120
|
-
# @return [
|
|
121
|
-
def
|
|
122
|
-
|
|
163
|
+
# @return [String]
|
|
164
|
+
def ambiguous_message(str, candidates)
|
|
165
|
+
"Ambiguous identifier #{str.to_s.strip.inspect}: matches #{candidates.inspect}"
|
|
166
|
+
end
|
|
167
|
+
|
|
168
|
+
# @return [String]
|
|
169
|
+
def parse_error_message(str, types)
|
|
170
|
+
base = "No matching identifier type found for #{str.to_s.strip.inspect}"
|
|
171
|
+
types ? "#{base} among #{types.inspect}" : base
|
|
123
172
|
end
|
|
173
|
+
|
|
174
|
+
def detector = @detector ||= Detector.new(identifier_list)
|
|
175
|
+
def scanner = @scanner ||= Scanner.new(identifier_list)
|
|
176
|
+
def identifier_map = @identifier_map ||= {}
|
|
177
|
+
def identifier_list = @identifier_list ||= []
|
|
124
178
|
end
|
|
125
179
|
end
|
|
126
180
|
|
|
@@ -131,6 +185,7 @@ require 'sec_id/concerns/validatable'
|
|
|
131
185
|
require 'sec_id/concerns/checkable'
|
|
132
186
|
require 'sec_id/base'
|
|
133
187
|
require 'sec_id/detector'
|
|
188
|
+
require 'sec_id/scanner'
|
|
134
189
|
require 'sec_id/isin'
|
|
135
190
|
require 'sec_id/cusip'
|
|
136
191
|
require 'sec_id/sedol'
|
data/sec_id.gemspec
CHANGED
|
@@ -10,10 +10,10 @@ Gem::Specification.new do |spec|
|
|
|
10
10
|
spec.authors = ['Leonid Svyatov']
|
|
11
11
|
spec.email = ['leonid@svyatov.ru']
|
|
12
12
|
|
|
13
|
-
spec.summary = '
|
|
14
|
-
spec.description = 'Validate,
|
|
15
|
-
'
|
|
16
|
-
'and FISN
|
|
13
|
+
spec.summary = 'A Ruby toolkit for securities identifiers — validate, parse, normalize, detect, and convert.'
|
|
14
|
+
spec.description = 'Validate, normalize, parse, and convert securities identifiers. Auto-detect identifier ' \
|
|
15
|
+
'type from any string. Calculate and restore check digits. Supports ISIN, CUSIP, CEI, ' \
|
|
16
|
+
'SEDOL, FIGI, LEI, IBAN, CIK, OCC, WKN, Valoren, CFI, and FISN.'
|
|
17
17
|
spec.homepage = 'https://github.com/svyatov/sec_id'
|
|
18
18
|
spec.license = 'MIT'
|
|
19
19
|
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: sec_id
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 5.
|
|
4
|
+
version: 5.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Leonid Svyatov
|
|
@@ -9,9 +9,9 @@ bindir: bin
|
|
|
9
9
|
cert_chain: []
|
|
10
10
|
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
11
|
dependencies: []
|
|
12
|
-
description: Validate,
|
|
13
|
-
|
|
14
|
-
CFI, and FISN
|
|
12
|
+
description: Validate, normalize, parse, and convert securities identifiers. Auto-detect
|
|
13
|
+
identifier type from any string. Calculate and restore check digits. Supports ISIN,
|
|
14
|
+
CUSIP, CEI, SEDOL, FIGI, LEI, IBAN, CIK, OCC, WKN, Valoren, CFI, and FISN.
|
|
15
15
|
email:
|
|
16
16
|
- leonid@svyatov.ru
|
|
17
17
|
executables: []
|
|
@@ -41,6 +41,7 @@ files:
|
|
|
41
41
|
- lib/sec_id/isin.rb
|
|
42
42
|
- lib/sec_id/lei.rb
|
|
43
43
|
- lib/sec_id/occ.rb
|
|
44
|
+
- lib/sec_id/scanner.rb
|
|
44
45
|
- lib/sec_id/sedol.rb
|
|
45
46
|
- lib/sec_id/valoren.rb
|
|
46
47
|
- lib/sec_id/version.rb
|
|
@@ -70,5 +71,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
70
71
|
requirements: []
|
|
71
72
|
rubygems_version: 4.0.6
|
|
72
73
|
specification_version: 4
|
|
73
|
-
summary:
|
|
74
|
+
summary: A Ruby toolkit for securities identifiers — validate, parse, normalize, detect,
|
|
75
|
+
and convert.
|
|
74
76
|
test_files: []
|