sec_api 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. checksums.yaml +7 -0
  2. data/.devcontainer/Dockerfile +54 -0
  3. data/.devcontainer/README.md +178 -0
  4. data/.devcontainer/devcontainer.json +46 -0
  5. data/.devcontainer/docker-compose.yml +28 -0
  6. data/.devcontainer/post-create.sh +51 -0
  7. data/.devcontainer/post-start.sh +44 -0
  8. data/.rspec +3 -0
  9. data/.standard.yml +3 -0
  10. data/CHANGELOG.md +5 -0
  11. data/CLAUDE.md +0 -0
  12. data/LICENSE.txt +21 -0
  13. data/MIGRATION.md +274 -0
  14. data/README.md +370 -0
  15. data/Rakefile +10 -0
  16. data/config/secapi.yml.example +57 -0
  17. data/docs/development-guide.md +291 -0
  18. data/docs/enumerator_pattern_design.md +483 -0
  19. data/docs/examples/README.md +58 -0
  20. data/docs/examples/backfill_filings.rb +419 -0
  21. data/docs/examples/instrumentation.rb +583 -0
  22. data/docs/examples/query_builder.rb +308 -0
  23. data/docs/examples/streaming_notifications.rb +491 -0
  24. data/docs/index.md +244 -0
  25. data/docs/migration-guide-v1.md +1091 -0
  26. data/docs/pre-review-checklist.md +145 -0
  27. data/docs/project-overview.md +90 -0
  28. data/docs/project-scan-report.json +60 -0
  29. data/docs/source-tree-analysis.md +190 -0
  30. data/lib/sec_api/callback_helper.rb +49 -0
  31. data/lib/sec_api/client.rb +606 -0
  32. data/lib/sec_api/collections/filings.rb +267 -0
  33. data/lib/sec_api/collections/fulltext_results.rb +86 -0
  34. data/lib/sec_api/config.rb +590 -0
  35. data/lib/sec_api/deep_freezable.rb +42 -0
  36. data/lib/sec_api/errors/authentication_error.rb +24 -0
  37. data/lib/sec_api/errors/configuration_error.rb +5 -0
  38. data/lib/sec_api/errors/error.rb +75 -0
  39. data/lib/sec_api/errors/network_error.rb +26 -0
  40. data/lib/sec_api/errors/not_found_error.rb +23 -0
  41. data/lib/sec_api/errors/pagination_error.rb +28 -0
  42. data/lib/sec_api/errors/permanent_error.rb +29 -0
  43. data/lib/sec_api/errors/rate_limit_error.rb +57 -0
  44. data/lib/sec_api/errors/reconnection_error.rb +34 -0
  45. data/lib/sec_api/errors/server_error.rb +25 -0
  46. data/lib/sec_api/errors/transient_error.rb +28 -0
  47. data/lib/sec_api/errors/validation_error.rb +23 -0
  48. data/lib/sec_api/extractor.rb +122 -0
  49. data/lib/sec_api/filing_journey.rb +477 -0
  50. data/lib/sec_api/mapping.rb +125 -0
  51. data/lib/sec_api/metrics_collector.rb +411 -0
  52. data/lib/sec_api/middleware/error_handler.rb +250 -0
  53. data/lib/sec_api/middleware/instrumentation.rb +186 -0
  54. data/lib/sec_api/middleware/rate_limiter.rb +541 -0
  55. data/lib/sec_api/objects/data_file.rb +34 -0
  56. data/lib/sec_api/objects/document_format_file.rb +45 -0
  57. data/lib/sec_api/objects/entity.rb +92 -0
  58. data/lib/sec_api/objects/extracted_data.rb +118 -0
  59. data/lib/sec_api/objects/fact.rb +147 -0
  60. data/lib/sec_api/objects/filing.rb +197 -0
  61. data/lib/sec_api/objects/fulltext_result.rb +66 -0
  62. data/lib/sec_api/objects/period.rb +96 -0
  63. data/lib/sec_api/objects/stream_filing.rb +194 -0
  64. data/lib/sec_api/objects/xbrl_data.rb +356 -0
  65. data/lib/sec_api/query.rb +423 -0
  66. data/lib/sec_api/rate_limit_state.rb +130 -0
  67. data/lib/sec_api/rate_limit_tracker.rb +154 -0
  68. data/lib/sec_api/stream.rb +841 -0
  69. data/lib/sec_api/structured_logger.rb +199 -0
  70. data/lib/sec_api/types.rb +32 -0
  71. data/lib/sec_api/version.rb +42 -0
  72. data/lib/sec_api/xbrl.rb +220 -0
  73. data/lib/sec_api.rb +137 -0
  74. data/sig/sec_api.rbs +4 -0
  75. metadata +217 -0
@@ -0,0 +1,483 @@
1
+ # Enumerator Pattern Design for Auto-Pagination (Story 2.6)
2
+
3
+ ## Problem Statement
4
+
5
+ Epic 2, Story 2.6 requires lazy auto-pagination for backfill operations spanning thousands of filings. The API has:
6
+ - Offset-based pagination (`from` parameter)
7
+ - Max 50 results per page
8
+ - Max 10,000 results per query
9
+ - Response format: `{ total: {...}, filings: [...] }`
10
+
11
+ **Goal:** Provide a Ruby Enumerator that fetches pages on-demand without loading all results into memory.
12
+
13
+ ## Design Constraints
14
+
15
+ 1. **Memory Efficiency:** Don't load 10,000 filings into memory at once
16
+ 2. **Lazy Evaluation:** Only fetch next page when iterator needs it
17
+ 3. **Ruby Idioms:** Use standard Enumerator pattern (supports `.each`, `.map`, `.select`, etc.)
18
+ 4. **API Efficiency:** Minimize redundant requests
19
+ 5. **Thread Safety:** Iterator must be thread-safe (Epic 1 requirement)
20
+ 6. **Error Handling:** Retry transient errors (Epic 1 middleware handles this)
21
+
22
+ ## API Contract Review
23
+
24
+ From sec-api.io documentation:
25
+ - Request: `POST /` with `{ query: "...", from: "0", size: "50" }`
26
+ - Response: `{ total: { value: 1250, relation: "eq" }, filings: [...] }`
27
+ - Pagination: Increment `from` by `size` (e.g., 0 → 50 → 100 → 150)
28
+ - Limit: Max 10,000 results per query
29
+
30
+ ## Proposed Pattern: Lazy Enumerator with Page Fetching
31
+
32
+ ### Architecture
33
+
34
+ ```
35
+ QueryBuilder
36
+ ├── .search() → Filings collection (first page only, Story 2.5)
37
+ └── .auto_paginate() → AutoPaginatedFilings enumerator (all pages, Story 2.6)
38
+ └── Uses Ruby Enumerator::Lazy internally
39
+ ```
40
+
41
+ ### Implementation Strategy
42
+
43
+ **Option 1: Enumerator with fetch_next_page (RECOMMENDED)**
44
+
45
+ ```ruby
46
+ class QueryBuilder
47
+ def auto_paginate
48
+ AutoPaginatedFilings.new(self)
49
+ end
50
+ end
51
+
52
+ class AutoPaginatedFilings
53
+ include Enumerable
54
+
55
+ def initialize(query_builder)
56
+ @query_builder = query_builder
57
+ @current_offset = 0
58
+ @page_size = 50
59
+ @total_filings = nil # Unknown until first page fetched
60
+ end
61
+
62
+ def each
63
+ return enum_for(:each) unless block_given?
64
+
65
+ loop do
66
+ # Fetch current page
67
+ page = fetch_page(@current_offset)
68
+
69
+ # Update total on first fetch
70
+ @total_filings ||= page.total_count
71
+
72
+ # Yield each filing in the page
73
+ page.each { |filing| yield filing }
74
+
75
+ # Check if we're done
76
+ break unless page.has_more?
77
+ break if @current_offset + @page_size >= 10_000 # API max
78
+
79
+ # Advance to next page
80
+ @current_offset += @page_size
81
+ end
82
+ end
83
+
84
+ private
85
+
86
+ def fetch_page(offset)
87
+ # Build request with current offset
88
+ payload = @query_builder.build_payload(from: offset, size: @page_size)
89
+
90
+ # Execute request (retry middleware handles transient errors)
91
+ response = @query_builder.client.connection.post("/", payload)
92
+
93
+ # Return Filings collection for this page
94
+ Collections::Filings.new(response.body)
95
+ end
96
+ end
97
+ ```
98
+
99
+ **Option 2: Enumerator::Lazy (More Ruby-idiomatic)**
100
+
101
+ ```ruby
102
+ class QueryBuilder
103
+ def auto_paginate
104
+ Enumerator.new do |yielder|
105
+ offset = 0
106
+ page_size = 50
107
+
108
+ loop do
109
+ # Fetch page
110
+ payload = build_payload(from: offset, size: page_size)
111
+ response = @client.connection.post("/", payload)
112
+ page = Collections::Filings.new(response.body)
113
+
114
+ # Yield each filing
115
+ page.each { |filing| yielder << filing }
116
+
117
+ # Check if done
118
+ break unless page.has_more?
119
+ break if offset + page_size >= 10_000
120
+
121
+ offset += page_size
122
+ end
123
+ end.lazy # Lazy evaluation - only fetches when needed
124
+ end
125
+ end
126
+ ```
127
+
128
+ **Comparison:**
129
+
130
+ | Aspect | Option 1 (Class) | Option 2 (Enumerator.new) |
131
+ |--------|------------------|---------------------------|
132
+ | Code complexity | Medium | Low |
133
+ | Testability | High (can mock AutoPaginatedFilings) | Medium (Enumerator harder to mock) |
134
+ | Memory efficiency | Excellent | Excellent |
135
+ | Ruby idioms | Good (include Enumerable) | Excellent (native Enumerator) |
136
+ | Thread safety | Needs Mutex on state | Immutable state per iterator |
137
+ | Debugging | Easier (class with state) | Harder (closure state) |
138
+
139
+ **Recommendation:** Option 2 (Enumerator.new) for Story 2.6 because:
140
+ - More Ruby-idiomatic
141
+ - Less code to maintain
142
+ - Automatically thread-safe (each call creates new Enumerator)
143
+ - Lazy evaluation built-in
144
+
145
+ We can add Option 1 later if we need more control or testability.
146
+
147
+ ## Usage Examples
148
+
149
+ ### Basic Auto-Pagination
150
+
151
+ ```ruby
152
+ client = SecApi::Client.new
153
+ filings = client.query
154
+ .ticker("AAPL")
155
+ .form_type("10-K")
156
+ .date_range(from: "2020-01-01", to: "2023-12-31")
157
+ .auto_paginate
158
+
159
+ # Lazy iteration - fetches pages as needed
160
+ filings.each do |filing|
161
+ puts "#{filing.ticker}: #{filing.filed_at}"
162
+ # Only fetches next page when current page is exhausted
163
+ end
164
+ ```
165
+
166
+ ### With Lazy Enumerator Methods
167
+
168
+ ```ruby
169
+ # Take only first 100 filings (fetches 2 pages max)
170
+ filings.take(100).each { |f| process(f) }
171
+
172
+ # Find first match (stops iterating after finding)
173
+ filing = filings.find { |f| f.form_type == "10-K/A" }
174
+
175
+ # Map and filter (still lazy)
176
+ tickers = filings
177
+ .select { |f| f.form_type == "10-K" }
178
+ .map(&:ticker)
179
+ .uniq
180
+ ```
181
+
182
+ ### Memory-Efficient Backfill
183
+
184
+ ```ruby
185
+ # Process 5,000 filings without loading all into memory
186
+ client.query
187
+ .ticker("TSLA")
188
+ .date_range(from: "2015-01-01", to: "2024-12-31")
189
+ .auto_paginate
190
+ .each_slice(100) do |batch|
191
+ # Process in batches of 100
192
+ batch.each { |filing| extract_and_save(filing) }
193
+ end
194
+ ```
195
+
196
+ ## Implementation Details
197
+
198
+ ### QueryBuilder Changes
199
+
200
+ ```ruby
201
+ class QueryBuilder
202
+ # Existing terminal method (Story 2.5)
203
+ def search
204
+ payload = build_payload(from: 0, size: 50)
205
+ response = @client.connection.post("/", payload)
206
+ Collections::Filings.new(response.body)
207
+ end
208
+
209
+ # NEW: Auto-pagination terminal method (Story 2.6)
210
+ def auto_paginate
211
+ Enumerator.new do |yielder|
212
+ offset = 0
213
+ page_size = 50
214
+
215
+ loop do
216
+ payload = build_payload(from: offset, size: page_size)
217
+ response = @client.connection.post("/", payload)
218
+ page = Collections::Filings.new(response.body)
219
+
220
+ page.each { |filing| yielder << filing }
221
+
222
+ break unless page.has_more?
223
+ break if offset + page_size >= 10_000 # API max
224
+
225
+ offset += page_size
226
+ end
227
+ end.lazy
228
+ end
229
+
230
+ private
231
+
232
+ def build_payload(from:, size:)
233
+ {
234
+ query: to_lucene,
235
+ from: from.to_s,
236
+ size: size.to_s,
237
+ sort: @sort_config
238
+ }
239
+ end
240
+ end
241
+ ```
242
+
243
+ ### Filings Collection Enhancement
244
+
245
+ The `Collections::Filings` class needs pagination metadata:
246
+
247
+ ```ruby
248
+ class Collections::Filings
249
+ include Enumerable
250
+
251
+ attr_reader :total_count, :filings
252
+
253
+ def initialize(response)
254
+ @total = response["total"] || response[:total]
255
+ @total_count = @total["value"] || @total[:value]
256
+ @total_relation = @total["relation"] || @total[:relation]
257
+ @filings = (response["filings"] || response[:filings] || []).map do |data|
258
+ Filing.new(data)
259
+ end
260
+ @filings.freeze
261
+ end
262
+
263
+ def each(&block)
264
+ @filings.each(&block)
265
+ end
266
+
267
+ def has_more?
268
+ # If relation is "gte", there are definitely more results
269
+ # If we got a full page (50), there might be more
270
+ @total_relation == "gte" || @filings.size == 50
271
+ end
272
+
273
+ def size
274
+ @filings.size
275
+ end
276
+
277
+ def count
278
+ @total_count
279
+ end
280
+ end
281
+ ```
282
+
283
+ ## Error Handling
284
+
285
+ Auto-pagination leverages Epic 1's retry middleware:
286
+ - **TransientError** (network, 5xx, 429): Automatically retried with exponential backoff
287
+ - **PermanentError** (401, 404, validation): Raised immediately, iteration stops
288
+
289
+ ```ruby
290
+ begin
291
+ client.query.ticker("AAPL").auto_paginate.each do |filing|
292
+ process(filing)
293
+ end
294
+ rescue SecApi::AuthenticationError => e
295
+ # Permanent error - fix API key
296
+ logger.error("Authentication failed: #{e.message}")
297
+ rescue SecApi::TransientError => e
298
+ # Should never reach here - retry middleware handles it
299
+ logger.error("Retry exhausted: #{e.message}")
300
+ end
301
+ ```
302
+
303
+ ## Thread Safety
304
+
305
+ Each call to `.auto_paginate` creates a new Enumerator with its own closure state:
306
+
307
+ ```ruby
308
+ # Thread-safe - each thread gets independent iterator
309
+ threads = 10.times.map do |i|
310
+ Thread.new do
311
+ client.query.ticker("AAPL").auto_paginate.take(100).each do |filing|
312
+ puts "Thread #{i}: #{filing.ticker}"
313
+ end
314
+ end
315
+ end
316
+ threads.each(&:join)
317
+ ```
318
+
319
+ **Why it's thread-safe:**
320
+ - No shared mutable state
321
+ - Each Enumerator has its own `offset` variable in closure
322
+ - Faraday connection pool handles concurrent requests (Epic 1)
323
+ - Filing objects are immutable (Dry::Struct from Epic 1)
324
+
325
+ ## Performance Characteristics
326
+
327
+ **Memory:**
328
+ - O(page_size) = O(50) filings in memory at a time
329
+ - Total memory constant regardless of result set size
330
+
331
+ **Network:**
332
+ - N requests for N pages
333
+ - Each request: 50 filings
334
+ - Total requests for 5,000 filings: 100 requests
335
+
336
+ **Latency:**
337
+ - First filing: 1 request (same as `.search`)
338
+ - Filing 51: 2 requests (fetches page 2)
339
+ - Filing 101: 3 requests (fetches page 3)
340
+ - Total latency for 5,000 filings: ~100 requests × ~2s = ~200s (acceptable for backfill)
341
+
342
+ **Optimization Opportunities (Future):**
343
+ - Prefetch next page in background while processing current page
344
+ - Configurable page size (currently hardcoded to 50)
345
+ - Parallel page fetching for multiple ticker backfills
346
+
347
+ ## Testing Strategy
348
+
349
+ ### Unit Tests (QueryBuilder)
350
+
351
+ ```ruby
352
+ RSpec.describe QueryBuilder do
353
+ describe "#auto_paginate" do
354
+ it "returns an Enumerator" do
355
+ builder = QueryBuilder.new(client)
356
+ expect(builder.ticker("AAPL").auto_paginate).to be_a(Enumerator)
357
+ end
358
+
359
+ it "fetches multiple pages lazily" do
360
+ # Stub 3 pages: 50, 50, 25 filings
361
+ stub_page(0, 50, total: 125, has_more: true)
362
+ stub_page(50, 50, total: 125, has_more: true)
363
+ stub_page(100, 25, total: 125, has_more: false)
364
+
365
+ filings = builder.ticker("AAPL").auto_paginate.to_a
366
+ expect(filings.size).to eq(125)
367
+ end
368
+
369
+ it "stops at 10,000 filing API limit" do
370
+ # Even if total > 10,000, stop at 10,000
371
+ stub_page(0, 50, total: 15_000, has_more: true)
372
+ # ... stub pages up to offset 9,950
373
+
374
+ filings = builder.ticker("AAPL").auto_paginate.to_a
375
+ expect(filings.size).to eq(10_000)
376
+ end
377
+
378
+ it "supports lazy operations" do
379
+ stub_page(0, 50, total: 500, has_more: true)
380
+
381
+ # Only fetches first page (take(10) stops early)
382
+ filings = builder.ticker("AAPL").auto_paginate.take(10).to_a
383
+ expect(filings.size).to eq(10)
384
+ expect(client.connection).to have_received(:post).once # Only 1 request
385
+ end
386
+ end
387
+ end
388
+ ```
389
+
390
+ ### Integration Tests
391
+
392
+ ```ruby
393
+ RSpec.describe "Auto-pagination integration" do
394
+ it "handles multi-page backfill" do
395
+ client = SecApi::Client.new
396
+
397
+ filings = client.query
398
+ .ticker("AAPL")
399
+ .date_range(from: "2023-01-01", to: "2023-12-31")
400
+ .auto_paginate
401
+ .to_a
402
+
403
+ expect(filings).to all(be_a(Filing))
404
+ expect(filings.map(&:ticker).uniq).to eq(["AAPL"])
405
+ end
406
+
407
+ it "retries transient errors during pagination" do
408
+ # First page succeeds, second page fails with 503, then succeeds
409
+ # Retry middleware should handle transparently
410
+
411
+ # ... VCR cassette or stub setup
412
+ end
413
+ end
414
+ ```
415
+
416
+ ## 10,000 Result Limit Strategy
417
+
418
+ For queries returning >10,000 results, guide users to chunk by date:
419
+
420
+ ```ruby
421
+ # Helper method (could be added in Epic 2 or later)
422
+ def backfill_by_year(ticker, start_year, end_year)
423
+ (start_year..end_year).each do |year|
424
+ client.query
425
+ .ticker(ticker)
426
+ .date_range(from: "#{year}-01-01", to: "#{year}-12-31")
427
+ .auto_paginate
428
+ .each { |filing| process(filing) }
429
+ end
430
+ end
431
+
432
+ # Usage
433
+ backfill_by_year("AAPL", 2010, 2024) # Chunks into 15 queries (1 per year)
434
+ ```
435
+
436
+ **Documentation Note:** Warn users in YARD docs:
437
+ ```ruby
438
+ # @note The sec-api.io API limits queries to 10,000 results. For larger
439
+ # datasets, split your query into smaller date ranges.
440
+ # @example Multi-year backfill with date chunking
441
+ # (2010..2024).each do |year|
442
+ # client.query.ticker("AAPL")
443
+ # .date_range(from: "#{year}-01-01", to: "#{year}-12-31")
444
+ # .auto_paginate.each { |filing| process(filing) }
445
+ # end
446
+ ```
447
+
448
+ ## Summary
449
+
450
+ **Pattern:** Enumerator.new with lazy evaluation
451
+ **Memory:** O(50) constant per iterator
452
+ **Thread Safety:** ✅ Each call creates independent Enumerator
453
+ **Error Handling:** ✅ Leverages Epic 1 retry middleware
454
+ **API Efficiency:** ✅ Only fetches pages as needed
455
+ **Ruby Idioms:** ✅ Works with `.map`, `.select`, `.take`, etc.
456
+
457
+ **Implementation Complexity:** Low (< 20 lines in QueryBuilder)
458
+ **Test Coverage:** Medium (stub pagination responses, test lazy behavior)
459
+
460
+ **Ready for Story 2.6 implementation:** ✅
461
+
462
+ ---
463
+
464
+ **Design Decision Log:**
465
+
466
+ | Decision | Rationale |
467
+ |----------|-----------|
468
+ | Enumerator.new over custom class | More idiomatic Ruby, less code, auto thread-safe |
469
+ | Lazy evaluation (.lazy) | Memory efficiency, supports Ruby Enumerable methods |
470
+ | Hardcode page_size to 50 | API max, no benefit to smaller pages |
471
+ | Stop at 10,000 limit | API constraint, document chunking strategy |
472
+ | No background prefetching | Keep it simple for Story 2.6, optimize later if needed |
473
+ | Leverage Epic 1 retry middleware | DRY principle, automatic transient error recovery |
474
+
475
+ ---
476
+
477
+ **Next Steps for Story 2.6:**
478
+ 1. Implement `auto_paginate` method in QueryBuilder
479
+ 2. Add `has_more?` method to Filings collection
480
+ 3. Write unit tests for lazy pagination behavior
481
+ 4. Write integration tests with multi-page VCR cassettes
482
+ 5. Update YARD documentation with 10,000 limit warning
483
+ 6. Add usage examples to README
@@ -0,0 +1,58 @@
1
+ # SecApi Usage Examples
2
+
3
+ This directory contains working code examples demonstrating common usage patterns for the `sec_api` Ruby gem.
4
+
5
+ ## Prerequisites
6
+
7
+ 1. Install the gem:
8
+ ```bash
9
+ gem install sec_api
10
+ # or add to your Gemfile
11
+ gem 'sec_api', '~> 1.0'
12
+ ```
13
+
14
+ 2. Set your API key:
15
+ ```bash
16
+ export SECAPI_API_KEY="your_api_key_here"
17
+ ```
18
+
19
+ Get your API key from [sec-api.io](https://sec-api.io)
20
+
21
+ ## Available Examples
22
+
23
+ | File | Description |
24
+ |------|-------------|
25
+ | [query_builder.rb](query_builder.rb) | Query filings by ticker, CIK, form type, date range, and full-text search |
26
+ | [backfill_filings.rb](backfill_filings.rb) | Multi-year backfill with auto-pagination and progress logging |
27
+ | [streaming_notifications.rb](streaming_notifications.rb) | Real-time WebSocket notifications with filters and callbacks |
28
+ | [instrumentation.rb](instrumentation.rb) | Logging, metrics, and filing journey tracking |
29
+
30
+ ## Running Examples
31
+
32
+ Each example is self-contained and can be run directly:
33
+
34
+ ```bash
35
+ ruby docs/examples/query_builder.rb
36
+ ruby docs/examples/backfill_filings.rb
37
+ ruby docs/examples/streaming_notifications.rb
38
+ ruby docs/examples/instrumentation.rb
39
+ ```
40
+
41
+ ## Example Structure
42
+
43
+ Each example file follows a consistent structure:
44
+ - Header comments explaining what it demonstrates
45
+ - Prerequisites and usage instructions
46
+ - Clearly commented code sections
47
+ - Copy-paste ready patterns
48
+
49
+ ## API Documentation
50
+
51
+ For detailed API reference, see the YARD documentation:
52
+
53
+ ```bash
54
+ bundle exec yard doc
55
+ open doc/index.html
56
+ ```
57
+
58
+ Or read the [migration guide](../migration-guide-v1.md) for comprehensive API patterns.