@yottagraph-app/data-model-skill 0.0.37 → 0.0.39

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,481 @@
1
+ # IAPD Data Dictionary
2
+
3
+ ## 1. Purpose / Source Overview
4
+
5
+ The IAPD dataset publishes the SEC's Investment Adviser Public Disclosure
6
+ data — the canonical federal+state registry of investment advisory firms
7
+ in the United States. It is sourced from two daily-refreshed XML
8
+ compilation feeds:
9
+
10
+ - `IA_FIRM_SEC_Feed_*.xml.gz` — SEC-registered Investment Advisers (RIAs)
11
+ and Exempt Reporting Advisers (ERAs). ~23K firms.
12
+ - `IA_FIRM_STATE_Feed_*.xml.gz` — State-registered Investment Advisers
13
+ (IAs). ~22K firms.
14
+
15
+ Each `<Firm>` element corresponds to a single advisory firm's most recent
16
+ **Form ADV Part 1A** filing on file with the SEC's IARD system. Part 1A
17
+ captures firm identity, contact info, regulatory status, ownership form,
18
+ employees, clients, assets under management (AUM), services offered, and
19
+ disciplinary Y/N flags. The bulk feed does **not** include Form ADV
20
+ Part 1B (state-specific addenda), Form ADV Part 2 (firm brochure PDFs),
21
+ or Form ADV Part 3 (CRS Form). Those would require per-firm PDF fetches
22
+ and are deferred to a future iteration.
23
+
24
+ Cadence: nightly. Volume: ~45K firm records per snapshot. Each daily
25
+ snapshot is a full re-publish — there is no incremental "diffs only"
26
+ feed.
27
+
28
+ Two `Record.Source` values are emitted, so downstream consumers can
29
+ distinguish the firms by their regulatory domicile:
30
+
31
+ | Record.Source | Coverage |
32
+ |---|---|
33
+ | `iapd_sec` | SEC-registered RIAs + ERAs (`IA_FIRM_SEC_Feed_*.xml.gz`) |
34
+ | `iapd_state` | State-registered IAs (`IA_FIRM_STATE_Feed_*.xml.gz`) |
35
+
36
+ The two streams share entity types, properties, and the strong-id
37
+ scheme, so a firm that transitions between SEC-registered and
38
+ state-registered status will resolve to the same `organization` entity
39
+ over time.
40
+
41
+ ## 2. Entity Types
42
+
43
+ ### `organization`
44
+
45
+ A registered investment adviser firm — i.e. an entity registered with
46
+ the SEC and/or one or more state securities regulators to provide
47
+ investment advice. The bulk feed represents only the firm; the
48
+ individual investment-adviser representatives are not in this feed.
49
+
50
+ - **Strong-ID properties:** `crd_number`, `sec_file_number`,
51
+ `company_cik`, `lei`.
52
+ - `crd_number` is the primary strong-ID — FINRA-assigned, present on
53
+ every firm in both feeds, stable across registration-status
54
+ changes. Property name is intentionally unprefixed (not
55
+ `organization_crd_number`) so it matches the same-named strong-ID
56
+ on `edgar`'s `organization` flavor — an IAPD firm and an EDGAR
57
+ registrant with the same CRD resolve to the same entity.
58
+ - `sec_file_number` (`801-…` / `802-…`) is a secondary strong-ID
59
+ emitted only for firms in the SEC feed where `Info@SECNb` is
60
+ non-empty.
61
+ - `company_cik` and `lei` are declared strong-ID slots but **not
62
+ populated** from the bulk Part 1A XML — neither field is in that
63
+ feed. The slots exist so downstream cross-walks (Form ADV
64
+ Schedule R, third-party CRD↔LEI / CRD↔CIK mappings) can populate
65
+ them without a breaking schema change. Adding a strong-ID property
66
+ after the schema is in production is a breaking change; adding
67
+ *values* into an existing slot is not.
68
+
69
+ ### `location`
70
+
71
+ A named geographic place where the firm has a main office or mailing
72
+ address. Resolved by name (`City, ST` for US; `City, Country` for
73
+ non-US).
74
+
75
+ ## 3. Properties
76
+
77
+ ### Identity and Registration (Organization)
78
+
79
+ * `crd_number` *(primary strong-ID)*
80
+ * Definition: FINRA-assigned firm CRD (Central Registration
81
+ Depository) number. The canonical unique identifier for both
82
+ SEC-registered and state-registered investment advisers; stable
83
+ across registration status changes.
84
+ * Examples: `283882`, `312360`, `324069`
85
+ * Derivation: `Info@FirmCrdNb` attribute on each `<Firm>` element of
86
+ the daily IAPD XML feed; identical attribute name on SEC and state
87
+ feeds. Firms with empty `Info@FirmCrdNb` are skipped by the
88
+ streamer (they would have no usable strong-ID).
89
+ * Cross-source: matches `edgar.organization.crd_number` —
90
+ purposefully unprefixed.
91
+
92
+ * `sec_file_number` *(secondary strong-ID, SEC feed only)*
93
+ * Definition: SEC-issued file number for the firm. Prefix encodes the
94
+ registration kind: `801-XXXXXX` for SEC-registered RIAs, `802-XXXXXX`
95
+ for Exempt Reporting Advisers (ERAs).
96
+ * Examples: `801-135399`, `802-120553`
97
+ * Derivation: `Info@SECNb` attribute on `<Firm>` in the SEC feed.
98
+ Not present on state-feed firms; emitted as a strong-ID only when
99
+ non-empty.
100
+ * Cross-source: matches `edgar.organization.sec_file_number` for
101
+ firms that also file via EDGAR.
102
+
103
+ * `company_cik` *(strong-ID slot — not populated by IAPD)*
104
+ * Definition: SEC Central Index Key — EDGAR's per-filer numeric ID.
105
+ * Examples: `1234567`
106
+ * Derivation: not present in the bulk Part 1A XML. Slot exists so
107
+ Form ADV Schedule R or third-party CRD↔CIK cross-walks can populate
108
+ it without a breaking schema change. Property name matches
109
+ `edgar.organization.company_cik` so cross-source ER works the
110
+ moment values appear.
111
+
112
+ * `lei` *(strong-ID slot — not populated by IAPD)*
113
+ * Definition: Legal Entity Identifier — 20-character ISO 17442 code.
114
+ * Examples: `549300LQQAVPLATTSU38`
115
+ * Derivation: not present in the bulk Part 1A XML. Slot exists so
116
+ per-firm IAPD detail (Schedule R) or third-party CRD↔LEI
117
+ cross-walks can populate it without a breaking schema change.
118
+ Property name matches `gleif.organization.lei` and
119
+ `edgar.organization.lei`.
120
+
121
+ * `primary_business_name`
122
+ * Definition: Primary name under which the firm conducts advisory
123
+ business. May differ from legal name when the firm operates under a
124
+ trade name / DBA.
125
+ * Examples: `RABENOLD ADVISORS, INC.`, `MK CAPITAL`
126
+ * Derivation: `Info@BusNm` attribute on `<Firm>`.
127
+
128
+ * `legal_name`
129
+ * Definition: Firm's full legal name as registered with the SEC and/or
130
+ state regulators.
131
+ * Examples: `RABENOLD ADVISORS, INC.`, `MK CAPITAL COMPANY`
132
+ * Derivation: `Info@LegalNm` attribute on `<Firm>`. Also passed as an
133
+ alias on the entity for entity resolution.
134
+
135
+ * `is_umbrella_registration`
136
+ * Definition: String-encoded boolean (`"true"`/`"false"`) indicating
137
+ whether this filing represents an umbrella registration covering
138
+ multiple filing-adviser/relying-adviser entities under a single
139
+ Form ADV.
140
+ * Examples: `true`, `false`
141
+ * Derivation: `Info@UmbrRgstn` attribute on `<Firm>` (`Y`/`N`),
142
+ normalized to `true`/`false`.
143
+
144
+ * `sec_region_code`
145
+ * Definition: SEC supervisory regional office code that has
146
+ jurisdiction over this firm (SEC-registered firms only).
147
+ * Examples: `NYRO`, `CHRO`, `LARO`
148
+ * Derivation: `Info@SECRgnCD` attribute. Omitted on state-feed firms
149
+ (where the SEC has no supervisory role).
150
+
151
+ * `firm_registration_type`
152
+ * Definition: Firm's registration disposition with the SEC at the time
153
+ of the most recent filing. One of `Registered` (full SEC-registered
154
+ RIA), `ERA` (Exempt Reporting Adviser — files but is not fully
155
+ registered), or other values the SEC may emit.
156
+ * Examples: `Registered`, `ERA`
157
+ * Derivation: `Rgstn@FirmType` attribute on `<Firm>` (SEC feed). State
158
+ feed instead carries a `<StateRgstn>` block; the equivalent on the
159
+ state feed is derivable from the presence of `<StateRgstn>` plus
160
+ individual regulator codes.
161
+
162
+ * `registration_status`
163
+ * Definition: Status of the firm's registration with its primary
164
+ regulator. For SEC-registered firms this is `Rgstn@St` (e.g.
165
+ `APPROVED`); for state-registered firms this is derived from the
166
+ first `<Rgltr>` element under `<StateRgstn>`.
167
+ * Examples: `APPROVED`, `ACTIVE`, `PENDING`, `TERMINATED`
168
+ * Derivation: `Rgstn@St` (SEC feed) or `StateRgstn/Rgltrs/Rgltr@St`
169
+ (state feed, first element).
170
+
171
+ * `registration_date`
172
+ * Definition: Date the firm was approved/registered by its primary
173
+ regulator, formatted YYYY-MM-DD.
174
+ * Examples: `2026-02-24`, `2021-02-16`
175
+ * Derivation: `Rgstn@Dt` (SEC) or first `Rgltr@Dt` under
176
+ `StateRgstn/Rgltrs` (state).
177
+
178
+ * `notice_filed_state_count`
179
+ * Definition: Number of US states where the firm has made notice
180
+ filings (SEC-registered firms). Notice filings are required of
181
+ SEC-registered firms in states where they have a place of business
182
+ or sufficient clients. Emitted as a float for numeric queryability.
183
+ * Examples: `0`, `1`, `50`
184
+ * Derivation: count of `<States>` elements under `<NoticeFiled>`
185
+ (SEC feed only).
186
+
187
+ * `state_registration_count`
188
+ * Definition: Number of US state and territorial securities regulators
189
+ with which the firm is registered (state-registered firms). Emitted
190
+ as a float.
191
+ * Examples: `1`, `5`, `30`
192
+ * Derivation: count of `<Rgltr>` elements under `<StateRgstn>/<Rgltrs>`
193
+ (state feed only).
194
+
195
+ * `latest_filing_date`
196
+ * Definition: Date of the most recent Form ADV filing represented in
197
+ this snapshot, YYYY-MM-DD.
198
+ * Examples: `2026-03-04`, `2025-08-01`
199
+ * Derivation: `Filing@Dt` attribute on `<Firm>`. Also used as the
200
+ record-level `Timestamp` because it bounds the freshness of the
201
+ Form ADV data.
202
+
203
+ * `form_adv_version`
204
+ * Definition: Version label of the Form ADV form template used for
205
+ this filing.
206
+ * Examples: `10/2021`
207
+ * Derivation: `Filing@FormVrsn` attribute on `<Firm>`.
208
+
209
+ ### Address (Organization)
210
+
211
+ * `physical_address`
212
+ * Definition: Firm's main office street address, formatted
213
+ `"Street1, Street2, City, ST ZIP, Country"`.
214
+ * Examples: `5930 MAIN STREET, SUITE 400, WILLIAMSVILLE, NY 14221,
215
+ United States`
216
+ * Derivation: assembled from `MainAddr@Strt1`, `MainAddr@Strt2`,
217
+ `MainAddr@City`, `MainAddr@State`, `MainAddr@PostlCd`,
218
+ `MainAddr@Cntry` on `<Firm>`.
219
+
220
+ * `mailing_address`
221
+ * Definition: Firm's mailing address when distinct from the main
222
+ office address.
223
+ * Examples: `PO BOX 1234, ALBANY, NY 12201, United States`
224
+ * Derivation: same assembly applied to `<MailingAddr>`. Omitted when
225
+ the element is empty (the common case).
226
+
227
+ * `main_phone_number`
228
+ * Definition: Main-office phone number as published by the firm. Not
229
+ normalized; the SEC publishes whatever the firm entered.
230
+ * Examples: `716-568-8790`, `6033037688`
231
+ * Derivation: `MainAddr@PhNb`.
232
+
233
+ * `main_fax_number`
234
+ * Definition: Main-office fax number, if any.
235
+ * Examples: `716-568-8791`
236
+ * Derivation: `MainAddr@FaxNb`. Often missing.
237
+
238
+ ### Web Presence (Organization)
239
+
240
+ * `website`
241
+ * Definition: Firm's website URL. The bulk feed allows multiple web
242
+ addresses per firm; we emit one `website` atom per `<WebAddr>`
243
+ element (i.e., the property can be multi-valued for a single firm).
244
+ * Examples: `http://www.rabenoldadvisors.com`,
245
+ `https://www.mkcapital.com/`
246
+ * Derivation: each `<WebAddr>` inside
247
+ `Part1A/Item1/WebAddrs` on `<Firm>`.
248
+
249
+ ### Organization Form (Organization)
250
+
251
+ * `organization_form`
252
+ * Definition: Legal organization form of the firm (Item 3A of Form
253
+ ADV).
254
+ * Examples: `Corporation`, `Limited Partnership`, `Limited Liability
255
+ Company`, `Sole Proprietorship`
256
+ * Derivation: `Item3A@OrgFormNm` under `Part1A` on `<Firm>`.
257
+
258
+ * `fiscal_year_end_month`
259
+ * Definition: Month in which the firm's fiscal year ends (Item 3B of
260
+ Form ADV).
261
+ * Examples: `DECEMBER`, `JUNE`
262
+ * Derivation: `Item3B@Q3B`.
263
+
264
+ * `state_of_formation`
265
+ * Definition: US state or country in which the firm was organized
266
+ (Item 3C of Form ADV). Stored as the 2-letter US state code when
267
+ inside the US; otherwise the country name.
268
+ * Examples: `NY`, `DE`, `IL`
269
+ * Derivation: `Item3C@StateCD` (US) or `Item3C@CntryNm` (non-US).
270
+
271
+ * `country_of_formation`
272
+ * Definition: Country in which the firm was organized.
273
+ * Examples: `United States`, `Cayman Islands`
274
+ * Derivation: `Item3C@CntryNm`.
275
+
276
+ ### Employees and Clients (Organization)
277
+
278
+ * `total_employees`
279
+ * Definition: Total number of employees worldwide as of the firm's
280
+ most recent fiscal year-end (Item 5A). Float for numeric
281
+ queryability.
282
+ * Examples: `4`, `150`, `12000`
283
+ * Derivation: `Item5A@TtlEmp`.
284
+
285
+ * `employees_providing_investment_advice`
286
+ * Definition: Number of employees who perform investment advisory
287
+ functions including research (Item 5B(1)).
288
+ * Examples: `1`, `30`
289
+ * Derivation: `Item5B@Q5B1`.
290
+
291
+ * `client_count_band`
292
+ * Definition: Approximate band for total number of advisory clients
293
+ (Item 5H). The SEC publishes this as a coarse band string rather
294
+ than an exact count for privacy.
295
+ * Examples: `0`, `1-10`, `11-25`, `26-100`, `101-250`, `251-500`,
296
+ `51-100`, `More than 500`
297
+ * Derivation: `Item5H@Q5H`.
298
+
299
+ ### Assets Under Management (Organization)
300
+
301
+ * `assets_under_management`
302
+ * Definition: Total regulatory assets under management (RAUM)
303
+ reported on Form ADV, in USD. Sum of discretionary and
304
+ non-discretionary RAUM. Item 5F(2)(c).
305
+ * Examples: `35557038`, `15000000000`
306
+ * Derivation: `Item5F@Q5F2C` (already in USD; passed through as a
307
+ float).
308
+
309
+ * `discretionary_assets_under_management`
310
+ * Definition: Regulatory AUM where the firm has discretionary
311
+ authority, in USD. Item 5F(2)(a).
312
+ * Examples: `35557038`, `0`
313
+ * Derivation: `Item5F@Q5F2A`.
314
+
315
+ * `non_discretionary_assets_under_management`
316
+ * Definition: Regulatory AUM where the firm advises without
317
+ discretionary authority, in USD. Item 5F(2)(b).
318
+ * Examples: `0`, `8000000`
319
+ * Derivation: `Item5F@Q5F2B`.
320
+
321
+ * `non_us_assets_under_management`
322
+ * Definition: Portion of RAUM attributable to non-US clients, in USD.
323
+ Item 5F(3).
324
+ * Examples: `0`, `1200000`
325
+ * Derivation: `Item5F@Q5F3`.
326
+
327
+ * `discretionary_account_count`
328
+ * Definition: Number of discretionary advisory accounts. Item 5F(2)(d).
329
+ * Examples: `117`, `5000`
330
+ * Derivation: `Item5F@Q5F2D`.
331
+
332
+ * `non_discretionary_account_count`
333
+ * Definition: Number of non-discretionary advisory accounts. Item
334
+ 5F(2)(e).
335
+ * Examples: `0`, `12`
336
+ * Derivation: `Item5F@Q5F2E`.
337
+
338
+ * `total_account_count`
339
+ * Definition: Total number of advisory accounts. Item 5F(2)(f).
340
+ * Examples: `117`, `5012`
341
+ * Derivation: `Item5F@Q5F2F`.
342
+
343
+ ### Advisory Services Offered (Organization)
344
+
345
+ Each service is emitted as a string atom with value `"true"` only when
346
+ the firm answered `Y` on the relevant Item 5G subfield. Absent atoms
347
+ mean the firm answered `N` or left the field blank — they are
348
+ intentionally not emitted as `"false"` to keep the atom count down at
349
+ production scale (~45K firms × ~10 services would add ~450K atoms with
350
+ no semantic gain).
351
+
352
+ * `provides_financial_planning_services`
353
+ * Definition: Firm provides financial planning services (Item 5G(1)).
354
+ * Examples: `true`
355
+ * Derivation: `Item5G@Q5G1=="Y"` — emitted only when true.
356
+
357
+ * `provides_individual_portfolio_management`
358
+ * Definition: Firm manages portfolios for individuals (Item 5G(2)).
359
+ * Examples: `true`
360
+ * Derivation: `Item5G@Q5G2=="Y"`.
361
+
362
+ * `provides_institutional_portfolio_management`
363
+ * Definition: Firm manages portfolios for institutions (Item 5G(5)).
364
+ * Examples: `true`
365
+ * Derivation: `Item5G@Q5G5=="Y"`.
366
+
367
+ * `provides_pooled_vehicle_portfolio_management`
368
+ * Definition: Firm manages portfolios for pooled investment vehicles
369
+ (Item 5G(3) and 5G(4) — covers both registered and unregistered
370
+ pooled vehicles).
371
+ * Examples: `true`
372
+ * Derivation: `Item5G@Q5G3=="Y" || Item5G@Q5G4=="Y"`.
373
+
374
+ * `provides_pension_consulting_services`
375
+ * Definition: Firm provides pension consulting services (Item 5G(8)).
376
+ * Examples: `true`
377
+ * Derivation: `Item5G@Q5G8=="Y"`.
378
+
379
+ * `provides_selection_of_other_advisers`
380
+ * Definition: Firm selects other advisers on behalf of clients
381
+ (including via wrap programs). Item 5G(9).
382
+ * Examples: `true`
383
+ * Derivation: `Item5G@Q5G9=="Y"`.
384
+
385
+ * `provides_market_timing_services`
386
+ * Definition: Firm offers market-timing services. Item 5G(10).
387
+ * Examples: `true`
388
+ * Derivation: `Item5G@Q5G10=="Y"`.
389
+
390
+ * `provides_security_ratings_services`
391
+ * Definition: Firm provides securities ratings or pricing services.
392
+ Item 5G(11).
393
+ * Examples: `true`
394
+ * Derivation: `Item5G@Q5G11=="Y"`.
395
+
396
+ * `provides_other_advisory_services`
397
+ * Definition: Firm provides other advisory services not listed
398
+ elsewhere on Item 5G; the free-form description (if any) lives in
399
+ `other_advisory_services_description`.
400
+ * Examples: `true`
401
+ * Derivation: `Item5G@Q5G12=="Y"`.
402
+
403
+ * `other_advisory_services_description`
404
+ * Definition: Free-text description of the "other" advisory services
405
+ the firm offers, when `provides_other_advisory_services` is true.
406
+ * Examples: `Investment research and securities analysis`
407
+ * Derivation: `Item5G@Q5G12Oth`. Omitted when blank.
408
+
409
+ ### Wrap Fee Programs (Organization)
410
+
411
+ * `is_wrap_fee_program_sponsor`
412
+ * Definition: String-encoded boolean indicating whether the firm
413
+ sponsors a wrap fee program (Item 5I(1)).
414
+ * Examples: `true`, `false`
415
+ * Derivation: `Item5I@Q5I1`, normalized from `Y`/`N`.
416
+
417
+ * `wrap_fee_sponsor_assets`
418
+ * Definition: Total assets in wrap fee programs the firm sponsors,
419
+ in USD. Item 5I(2)(a).
420
+ * Examples: `12500000000`
421
+ * Derivation: `Item5I@Q5I2A`. Omitted when 0 or missing.
422
+
423
+ * `wrap_fee_portfolio_assets`
424
+ * Definition: Total assets in wrap fee programs the firm acts as a
425
+ portfolio manager for, in USD. Item 5I(2)(b).
426
+ * Examples: `8000000`
427
+ * Derivation: `Item5I@Q5I2B`.
428
+
429
+ ### Disciplinary Disclosures (Organization)
430
+
431
+ Item 11 of Form ADV asks the firm to disclose criminal, regulatory,
432
+ civil, and bankruptcy history. The bulk XML feed exposes only the
433
+ Yes/No flags from Items 11A–H — the full DRP (Disclosure Reporting
434
+ Page) detail lives in the per-firm PDF brochures and is out of scope
435
+ for v0. We emit ONLY a single rollup flag (`has_disciplinary_disclosure`)
436
+ plus the individual flags that are most operationally useful
437
+ (criminal, regulatory action). The rest can be added incrementally
438
+ later.
439
+
440
+ * `has_disciplinary_disclosure`
441
+ * Definition: String-encoded boolean indicating that the firm or any
442
+ of its advisory affiliates has answered "Yes" to at least one
443
+ question in Item 11 (criminal, regulatory, civil, or bankruptcy).
444
+ A rollup; `false` means clean across all Item 11 subquestions.
445
+ * Examples: `true`, `false`
446
+ * Derivation: `Item11@Q11`, normalized from `Y`/`N`.
447
+
448
+ * `has_criminal_disclosure`
449
+ * Definition: String-encoded boolean rollup of Item 11A: any criminal
450
+ conviction or pending charge against the firm or an advisory
451
+ affiliate.
452
+ * Examples: `true`, `false`
453
+ * Derivation: `Item11A@Q11A1=="Y" || Item11A@Q11A2=="Y"`,
454
+ normalized.
455
+
456
+ * `has_regulatory_action_disclosure`
457
+ * Definition: String-encoded boolean rollup of Item 11B–E: any
458
+ regulatory action by the SEC, CFTC, other federal regulator, state
459
+ regulator, foreign regulator, or self-regulatory organization
460
+ against the firm or an advisory affiliate.
461
+ * Examples: `true`, `false`
462
+ * Derivation: logical-OR across all `Item11B..E@Q*` flags,
463
+ normalized.
464
+
465
+ ## 4. Entity Relationships Summary
466
+
467
+ ```
468
+ organization ──[is_located_at]──→ location (main office)
469
+ organization ──[is_located_at]──→ location (mailing address, when distinct)
470
+ ```
471
+
472
+ There are no inter-firm relationships in the bulk Form ADV feed —
473
+ ownership and control-person disclosures (Schedule A/B/C, Item 7) are
474
+ present at the Y/N flag level but the underlying detail rows are not in
475
+ the daily XML. Future iterations could add `controlled_by` /
476
+ `affiliated_with` relationships when those schedules are wired in.
477
+
478
+ ## 5. Attributes
479
+
480
+ None. All atoms carry standard citations; no source-specific
481
+ attributes are emitted.