@yottagraph-app/data-model-skill 0.0.23 → 0.0.25
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skill/edgar/schema.yaml +32 -0
- package/skill/lda/DATA_DICTIONARY.md +101 -0
- package/skill/lda/schema.yaml +209 -0
package/package.json
CHANGED
package/skill/edgar/schema.yaml
CHANGED
|
@@ -2631,6 +2631,22 @@ relationships:
|
|
|
2631
2631
|
target_flavors: ["organization"]
|
|
2632
2632
|
passive: true
|
|
2633
2633
|
|
|
2634
|
+
- name: "has_major_customer"
|
|
2635
|
+
description: "Link from a reporting company to a customer that represents a material concentration of revenue or receivables, as disclosed in the 10-K XBRL concentration risk schedule (srt:MajorCustomersAxis). The relationship attribute customer_revenue_concentration holds the disclosed percentage."
|
|
2636
|
+
display_name: "Has Major Customer"
|
|
2637
|
+
mergeability: not_mergeable
|
|
2638
|
+
domain_flavors: ["organization"]
|
|
2639
|
+
target_flavors: ["organization"]
|
|
2640
|
+
passive: true
|
|
2641
|
+
|
|
2642
|
+
- name: "is_major_customer_of"
|
|
2643
|
+
description: "Inverse of has_major_customer. Link from a customer organization to the reporting company for which it is a major customer."
|
|
2644
|
+
display_name: "Is Major Customer Of"
|
|
2645
|
+
mergeability: not_mergeable
|
|
2646
|
+
domain_flavors: ["organization"]
|
|
2647
|
+
target_flavors: ["organization"]
|
|
2648
|
+
passive: true
|
|
2649
|
+
|
|
2634
2650
|
- name: "is_part_of"
|
|
2635
2651
|
description: "An organization is a part of a larger organization, e.g., a subdivision or subsidiary within a larger company"
|
|
2636
2652
|
display_name: "Part Of"
|
|
@@ -3073,6 +3089,22 @@ attributes:
|
|
|
3073
3089
|
display_name: "Lending Value"
|
|
3074
3090
|
mergeability: not_mergeable
|
|
3075
3091
|
|
|
3092
|
+
# ── has_major_customer relationship attributes (10-K XBRL) ──
|
|
3093
|
+
|
|
3094
|
+
- property: "has_major_customer"
|
|
3095
|
+
name: "customer_revenue_concentration"
|
|
3096
|
+
type: string
|
|
3097
|
+
description: "Fraction of revenue (or receivables) attributable to this customer, as reported (e.g. \"0.18\" = 18%). Stored as a decimal string consistent with other numeric relationship attributes. Source: us-gaap:ConcentrationRiskPercentage1."
|
|
3098
|
+
display_name: "Customer Revenue Concentration"
|
|
3099
|
+
mergeability: not_mergeable
|
|
3100
|
+
|
|
3101
|
+
- property: "has_major_customer"
|
|
3102
|
+
name: "concentration_benchmark"
|
|
3103
|
+
type: string
|
|
3104
|
+
description: "What the concentration percentage is measured against (e.g. 'Revenue', 'Trade receivables'). Derived from the ConcentrationRiskByBenchmarkAxis member label."
|
|
3105
|
+
display_name: "Concentration Benchmark"
|
|
3106
|
+
mergeability: not_mergeable
|
|
3107
|
+
|
|
3076
3108
|
# ── custodied_by relationship attributes (N-CEN) ──
|
|
3077
3109
|
|
|
3078
3110
|
- property: "custodied_by"
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Data Dictionary: LDA (Lobbying Disclosure Act)
|
|
2
|
+
|
|
3
|
+
Last updated: 2026-05-04
|
|
4
|
+
|
|
5
|
+
## Source Overview
|
|
6
|
+
|
|
7
|
+
Lobbying Disclosure Act filings (registrations, quarterly activity, etc.) from the **unified LDA.gov REST API**. The streamer polls `GET /api/v1/filings/`, stores raw JSON pages, and emits v2 `FetchMessage` records.
|
|
8
|
+
|
|
9
|
+
| Item | Value |
|
|
10
|
+
|------|--------|
|
|
11
|
+
| Pipeline / stream | Configured in `streams.yaml` (see deployment) |
|
|
12
|
+
| `Record.Source` | `lda` |
|
|
13
|
+
|
|
14
|
+
Anonymous access is rate-limited; optional API token improves throughput.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Entity Types
|
|
19
|
+
|
|
20
|
+
### `lda_filing`
|
|
21
|
+
|
|
22
|
+
One disclosure identified by API `filing_uuid`.
|
|
23
|
+
|
|
24
|
+
- **Subject name:** `filing_uuid` (stable, unique per disclosure).
|
|
25
|
+
- **Strong id:** `lda_filing_uuid` on the filing subject.
|
|
26
|
+
- **Resolver:** `NOT_MERGEABLE` passive filing entity.
|
|
27
|
+
- **Timestamp:** Parsed from `dt_posted` only (no wall-clock fallback). If `dt_posted` is missing or invalid, the filing is not atomized.
|
|
28
|
+
|
|
29
|
+
### `organization`
|
|
30
|
+
|
|
31
|
+
Either the **registrant** (lobbying firm) or **client** on a filing.
|
|
32
|
+
|
|
33
|
+
- **Subject name:** API `name` (legal / display name).
|
|
34
|
+
- **Strong ids:** `lda_registrant_id` (registrant rows) or `lda_client_internal_id` (client rows).
|
|
35
|
+
- **Role:** Property `lda_party_role` = `registrant` or `client` on the organization **record** (not on the filing).
|
|
36
|
+
- **Resolver:** Named-entity info is `MERGEABLE` (resolver / recordeval ER search) while retaining LDA **strong ids**; flavor-level resolver info remains `NOT_MERGEABLE` per passive schema.
|
|
37
|
+
- **Snippets:** Formatted **address** only when present (no LDA filing UUID prefix).
|
|
38
|
+
|
|
39
|
+
### `location`
|
|
40
|
+
|
|
41
|
+
Geographic label derived from registrant or client address fields for `is_located_at` edges.
|
|
42
|
+
|
|
43
|
+
- **Name:** Typically `City, State` or `City, State, Country`; when city is absent, `State, Country` or state-only per atomizer rules.
|
|
44
|
+
- **Resolver:** `MERGEABLE` named entity (no strong id), for soft clustering with other sources.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Properties
|
|
49
|
+
|
|
50
|
+
### Filing
|
|
51
|
+
|
|
52
|
+
| Property | Description |
|
|
53
|
+
|----------|-------------|
|
|
54
|
+
| `lda_filing_uuid` | API `filing_uuid`. |
|
|
55
|
+
| `lda_filing_type` | Machine code (`filing_type`), e.g. `RR`, `Q1`. |
|
|
56
|
+
| `lda_filing_type_display` | Human label (`filing_type_display`). |
|
|
57
|
+
| `lda_filing_year` | Reporting year (float in schema). |
|
|
58
|
+
| `lda_filing_period_display` | Period label, e.g. quarter. |
|
|
59
|
+
| `lda_income` | Income string when present. |
|
|
60
|
+
| `lda_expenses` | Expenses string when present. |
|
|
61
|
+
| `lda_dt_posted` | Raw ISO `dt_posted` from API. |
|
|
62
|
+
| `lda_filing_document_url` | Public document URL. |
|
|
63
|
+
| `lda_posted_by_name` | Poster name when present. |
|
|
64
|
+
| `lda_lobbying_causes` | Repeated **once per** `lobbying_activities[]` row (`CODE (Display)`). Same pattern as patent **`cpc_code`**: narrative text is quad attribute **`lda_lobbying_cause_description`** on that atom (API `description` field). **Only on filing**; omitted if activities array is empty. |
|
|
65
|
+
|
|
66
|
+
### Organization
|
|
67
|
+
|
|
68
|
+
| Property | Description |
|
|
69
|
+
|----------|-------------|
|
|
70
|
+
| `lda_party_role` | `registrant` or `client`. |
|
|
71
|
+
| `lda_registrant_id` | Registrant API id as string. |
|
|
72
|
+
| `lda_client_internal_id` | Client row id (`client.id`) as string. |
|
|
73
|
+
| `address` | Single-line formatted address (street/city/state/zip + country). |
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Entity Relationships
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
lda_filing ──[lda_registrant]──→ organization (registrant)
|
|
81
|
+
lda_filing ──[lda_client]──────→ organization (client)
|
|
82
|
+
|
|
83
|
+
organization ──[is_located_at]──→ location
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
- **`lda_registrant` / `lda_client`:** Target atoms on the **`lda_filing`** record point at the same organization identities emitted as separate **organization** records for that page (strong ids + properties).
|
|
87
|
+
- **`is_located_at`:** On each **organization** record when the atomizer can derive a location name from city/state/country rules.
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Records Per Filing
|
|
92
|
+
|
|
93
|
+
For a typical filing with registrant and client, atomization yields **up to three** records: one filing, one registrant organization, one client organization. Either org may be omitted if required API fields are missing.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Citations
|
|
98
|
+
|
|
99
|
+
Primary citation text is the filing `url` when present; otherwise a synthetic label referencing `filing_uuid`.
|
|
100
|
+
|
|
101
|
+
---
|
|
@@ -0,0 +1,209 @@
|
|
|
1
|
+
# Dataset schema for U.S. Lobbying Disclosure Act (LDA) filings from the
|
|
2
|
+
# Congress/Senate unified REST API (https://lda.gov/api/).
|
|
3
|
+
#
|
|
4
|
+
# Atomizer output uses Record.Source "lda". Filings are passive filing entities
|
|
5
|
+
# keyed by lda_filing_uuid; registrant and client are organization entities linked
|
|
6
|
+
# from the filing via lda_registrant / lda_client relationships.
|
|
7
|
+
name: "lda"
|
|
8
|
+
description: "Lobbying Disclosure Act LD-1/LD-2 filings (registrations and quarterly activity) from the LDA.gov REST API"
|
|
9
|
+
|
|
10
|
+
extraction:
|
|
11
|
+
flavors: closed
|
|
12
|
+
properties: closed
|
|
13
|
+
relationships: closed
|
|
14
|
+
attributes: closed
|
|
15
|
+
events: closed
|
|
16
|
+
|
|
17
|
+
flavors:
|
|
18
|
+
- name: "lda_filing"
|
|
19
|
+
description: "A lobbying disclosure (registration or quarterly activity) identified by filing_uuid"
|
|
20
|
+
display_name: "LDA filing"
|
|
21
|
+
mergeability: not_mergeable
|
|
22
|
+
strong_id_properties: ["lda_filing_uuid"]
|
|
23
|
+
passive: true
|
|
24
|
+
|
|
25
|
+
- name: "organization"
|
|
26
|
+
description: "A particular business, institution, or organization such as a corporation, university, government agency, or non-profit"
|
|
27
|
+
display_name: "Organization"
|
|
28
|
+
mergeability: not_mergeable
|
|
29
|
+
strong_id_properties: ["lda_registrant_id", "lda_client_internal_id"]
|
|
30
|
+
passive: true
|
|
31
|
+
|
|
32
|
+
- name: "location"
|
|
33
|
+
description: "A specific named geographic location such as a city, country, region, or landmark"
|
|
34
|
+
display_name: "Location"
|
|
35
|
+
mergeability: not_mergeable
|
|
36
|
+
examples: ["Washington, DC, US", "Arlington, VA, US"]
|
|
37
|
+
passive: true
|
|
38
|
+
|
|
39
|
+
properties:
|
|
40
|
+
- name: "lda_filing_uuid"
|
|
41
|
+
type: string
|
|
42
|
+
description: "Stable UUID of the filing in the LDA REST API"
|
|
43
|
+
display_name: "LDA Filing UUID"
|
|
44
|
+
mergeability: not_mergeable
|
|
45
|
+
domain_flavors: ["lda_filing"]
|
|
46
|
+
passive: true
|
|
47
|
+
|
|
48
|
+
- name: "lda_filing_type"
|
|
49
|
+
type: string
|
|
50
|
+
description: "Machine filing type code (e.g. RR, Q1) from the API filing_type field; human label is attribute lda_filing_type_display when present"
|
|
51
|
+
display_name: "LDA Filing Type Code"
|
|
52
|
+
mergeability: not_mergeable
|
|
53
|
+
domain_flavors: ["lda_filing"]
|
|
54
|
+
passive: true
|
|
55
|
+
|
|
56
|
+
- name: "lda_filing_year"
|
|
57
|
+
type: float
|
|
58
|
+
description: "Reporting year associated with the filing"
|
|
59
|
+
display_name: "LDA Filing Year"
|
|
60
|
+
mergeability: not_mergeable
|
|
61
|
+
domain_flavors: ["lda_filing"]
|
|
62
|
+
passive: true
|
|
63
|
+
|
|
64
|
+
- name: "lda_filing_period"
|
|
65
|
+
type: string
|
|
66
|
+
description: "Reporting period code from filing_period (string or number in JSON); human label is attribute lda_filing_period_display when present"
|
|
67
|
+
display_name: "LDA Filing Period Code"
|
|
68
|
+
mergeability: not_mergeable
|
|
69
|
+
domain_flavors: ["lda_filing"]
|
|
70
|
+
passive: true
|
|
71
|
+
|
|
72
|
+
- name: "lda_income"
|
|
73
|
+
type: string
|
|
74
|
+
description: "Income amount reported on the filing when present (API decimal as string)"
|
|
75
|
+
display_name: "LDA Reported Income"
|
|
76
|
+
mergeability: not_mergeable
|
|
77
|
+
domain_flavors: ["lda_filing"]
|
|
78
|
+
passive: true
|
|
79
|
+
|
|
80
|
+
- name: "lda_expenses"
|
|
81
|
+
type: string
|
|
82
|
+
description: "Expenses amount reported on the filing when present (API decimal as string)"
|
|
83
|
+
display_name: "LDA Reported Expenses"
|
|
84
|
+
mergeability: not_mergeable
|
|
85
|
+
domain_flavors: ["lda_filing"]
|
|
86
|
+
passive: true
|
|
87
|
+
|
|
88
|
+
- name: "lda_dt_posted"
|
|
89
|
+
type: string
|
|
90
|
+
description: "Date and time the filing was posted (dt_posted), ISO-8601 string from the API"
|
|
91
|
+
display_name: "LDA Date Posted"
|
|
92
|
+
mergeability: not_mergeable
|
|
93
|
+
domain_flavors: ["lda_filing"]
|
|
94
|
+
passive: true
|
|
95
|
+
|
|
96
|
+
- name: "lda_filing_document_url"
|
|
97
|
+
type: string
|
|
98
|
+
description: "Public URL of the filing document (HTML/PDF) from filing_document_url"
|
|
99
|
+
display_name: "LDA Filing Document URL"
|
|
100
|
+
mergeability: not_mergeable
|
|
101
|
+
domain_flavors: ["lda_filing"]
|
|
102
|
+
passive: true
|
|
103
|
+
|
|
104
|
+
- name: "lda_posted_by_name"
|
|
105
|
+
type: string
|
|
106
|
+
description: "Name of the individual who posted the filing (posted_by_name)"
|
|
107
|
+
display_name: "LDA Posted By"
|
|
108
|
+
mergeability: not_mergeable
|
|
109
|
+
domain_flavors: ["lda_filing"]
|
|
110
|
+
passive: true
|
|
111
|
+
|
|
112
|
+
- name: "lda_lobbying_causes"
|
|
113
|
+
type: string
|
|
114
|
+
description: >-
|
|
115
|
+
One atom per lobbying_activities row: "CODE (Display)" from general_issue_code and
|
|
116
|
+
general_issue_code_display. Narrative text is attribute lda_lobbying_cause_description on the
|
|
117
|
+
same atom when present (same pattern as patent cpc_code + cpc_description).
|
|
118
|
+
display_name: "LDA Lobbying Causes"
|
|
119
|
+
mergeability: not_mergeable
|
|
120
|
+
domain_flavors: ["lda_filing"]
|
|
121
|
+
examples:
|
|
122
|
+
- "BUD (Budget/Appropriations)"
|
|
123
|
+
- "HCR (Health Issues)"
|
|
124
|
+
passive: true
|
|
125
|
+
|
|
126
|
+
- name: "lda_party_role"
|
|
127
|
+
type: string
|
|
128
|
+
description: "Whether this organization row is the lobbying registrant or the client on the linked LDA filing"
|
|
129
|
+
display_name: "LDA Party Role"
|
|
130
|
+
mergeability: not_mergeable
|
|
131
|
+
domain_flavors: ["organization"]
|
|
132
|
+
examples: ["registrant", "client"]
|
|
133
|
+
passive: true
|
|
134
|
+
|
|
135
|
+
- name: "lda_registrant_id"
|
|
136
|
+
type: string
|
|
137
|
+
description: "Stable LDA API registrant id (registrant.id) as a string, for entity resolution"
|
|
138
|
+
display_name: "LDA Registrant ID"
|
|
139
|
+
mergeability: not_mergeable
|
|
140
|
+
domain_flavors: ["organization"]
|
|
141
|
+
passive: true
|
|
142
|
+
|
|
143
|
+
- name: "lda_client_internal_id"
|
|
144
|
+
type: string
|
|
145
|
+
description: "Stable LDA API internal client row id (client.id) as a string, for entity resolution"
|
|
146
|
+
display_name: "LDA Client Internal ID"
|
|
147
|
+
mergeability: not_mergeable
|
|
148
|
+
domain_flavors: ["organization"]
|
|
149
|
+
passive: true
|
|
150
|
+
|
|
151
|
+
- name: "address"
|
|
152
|
+
type: string
|
|
153
|
+
description: "Physical street address of the entity"
|
|
154
|
+
display_name: "Address"
|
|
155
|
+
mergeability: not_mergeable
|
|
156
|
+
domain_flavors: ["organization"]
|
|
157
|
+
passive: true
|
|
158
|
+
|
|
159
|
+
relationships:
|
|
160
|
+
- name: "lda_registrant"
|
|
161
|
+
description: "The lobbying registrant firm that filed this LDA disclosure"
|
|
162
|
+
display_name: "LDA Registrant"
|
|
163
|
+
mergeability: not_mergeable
|
|
164
|
+
domain_flavors: ["lda_filing"]
|
|
165
|
+
target_flavors: ["organization"]
|
|
166
|
+
passive: true
|
|
167
|
+
|
|
168
|
+
- name: "lda_client"
|
|
169
|
+
description: "The client organization on whose behalf lobbying is reported for this LDA filing"
|
|
170
|
+
display_name: "LDA Client"
|
|
171
|
+
mergeability: not_mergeable
|
|
172
|
+
domain_flavors: ["lda_filing"]
|
|
173
|
+
target_flavors: ["organization"]
|
|
174
|
+
passive: true
|
|
175
|
+
|
|
176
|
+
- name: "is_located_at"
|
|
177
|
+
description: "An entity is located at, operates in, resides in, is headquartered in, was born in, visits, or died in a location"
|
|
178
|
+
display_name: "Located At"
|
|
179
|
+
mergeability: not_mergeable
|
|
180
|
+
domain_flavors: ["organization"]
|
|
181
|
+
target_flavors: ["location"]
|
|
182
|
+
passive: true
|
|
183
|
+
|
|
184
|
+
attributes:
|
|
185
|
+
- property: "lda_filing_period"
|
|
186
|
+
name: "lda_filing_period_display"
|
|
187
|
+
type: string
|
|
188
|
+
description: "Human-readable reporting period label from filing_period_display on the same atom as the period code"
|
|
189
|
+
display_name: "LDA Filing Period"
|
|
190
|
+
mergeability: not_mergeable
|
|
191
|
+
|
|
192
|
+
- property: "lda_filing_type"
|
|
193
|
+
name: "lda_filing_type_display"
|
|
194
|
+
type: string
|
|
195
|
+
description: "Human-readable filing type label from filing_type_display on the same atom as the type code"
|
|
196
|
+
display_name: "LDA Filing Type"
|
|
197
|
+
mergeability: not_mergeable
|
|
198
|
+
|
|
199
|
+
# Narrative for one lobbying issue row. Stored as a quad attribute on each lda_lobbying_causes atom.
|
|
200
|
+
- property: "lda_lobbying_causes"
|
|
201
|
+
name: "lda_lobbying_cause_description"
|
|
202
|
+
type: string
|
|
203
|
+
description: >-
|
|
204
|
+
lobbying_activities[].description for that row (optional trailing posted timestamp stripped).
|
|
205
|
+
Omitted as an attribute when the description is empty.
|
|
206
|
+
display_name: "LDA Lobbying Cause Description"
|
|
207
|
+
mergeability: not_mergeable
|
|
208
|
+
|
|
209
|
+
events: []
|