@yottagraph-app/data-model-skill 0.0.23 → 0.0.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@yottagraph-app/data-model-skill",
3
- "version": "0.0.23",
3
+ "version": "0.0.24",
4
4
  "description": "Data model skill documentation for AI agents - entity types, properties, and schemas from Lovelace fetch sources",
5
5
  "repository": {
6
6
  "type": "git",
@@ -0,0 +1,101 @@
1
+ # Data Dictionary: LDA (Lobbying Disclosure Act)
2
+
3
+ Last updated: 2026-05-04
4
+
5
+ ## Source Overview
6
+
7
+ Lobbying Disclosure Act filings (registrations, quarterly activity, etc.) from the **unified LDA.gov REST API**. The streamer polls `GET /api/v1/filings/`, stores raw JSON pages, and emits v2 `FetchMessage` records.
8
+
9
+ | Item | Value |
10
+ |------|--------|
11
+ | Pipeline / stream | Configured in `streams.yaml` (see deployment) |
12
+ | `Record.Source` | `lda` |
13
+
14
+ Anonymous access is rate-limited; optional API token improves throughput.
15
+
16
+ ---
17
+
18
+ ## Entity Types
19
+
20
+ ### `lda_filing`
21
+
22
+ One disclosure identified by API `filing_uuid`.
23
+
24
+ - **Subject name:** `filing_uuid` (stable, unique per disclosure).
25
+ - **Strong id:** `lda_filing_uuid` on the filing subject.
26
+ - **Resolver:** `NOT_MERGEABLE` passive filing entity.
27
+ - **Timestamp:** Parsed from `dt_posted` only (no wall-clock fallback). If `dt_posted` is missing or invalid, the filing is not atomized.
28
+
29
+ ### `organization`
30
+
31
+ Either the **registrant** (lobbying firm) or **client** on a filing.
32
+
33
+ - **Subject name:** API `name` (legal / display name).
34
+ - **Strong ids:** `lda_registrant_id` (registrant rows) or `lda_client_internal_id` (client rows).
35
+ - **Role:** Property `lda_party_role` = `registrant` or `client` on the organization **record** (not on the filing).
36
+ - **Resolver:** Named-entity info is `MERGEABLE` (resolver / recordeval ER search) while retaining LDA **strong ids**; flavor-level resolver info remains `NOT_MERGEABLE` per passive schema.
37
+ - **Snippets:** Formatted **address** only when present (no LDA filing UUID prefix).
38
+
39
+ ### `location`
40
+
41
+ Geographic label derived from registrant or client address fields for `is_located_at` edges.
42
+
43
+ - **Name:** Typically `City, State` or `City, State, Country`; when city is absent, `State, Country` or state-only per atomizer rules.
44
+ - **Resolver:** `MERGEABLE` named entity (no strong id), for soft clustering with other sources.
45
+
46
+ ---
47
+
48
+ ## Properties
49
+
50
+ ### Filing
51
+
52
+ | Property | Description |
53
+ |----------|-------------|
54
+ | `lda_filing_uuid` | API `filing_uuid`. |
55
+ | `lda_filing_type` | Machine code (`filing_type`), e.g. `RR`, `Q1`. |
56
+ | `lda_filing_type_display` | Human label (`filing_type_display`). |
57
+ | `lda_filing_year` | Reporting year (float in schema). |
58
+ | `lda_filing_period_display` | Period label, e.g. quarter. |
59
+ | `lda_income` | Income string when present. |
60
+ | `lda_expenses` | Expenses string when present. |
61
+ | `lda_dt_posted` | Raw ISO `dt_posted` from API. |
62
+ | `lda_filing_document_url` | Public document URL. |
63
+ | `lda_posted_by_name` | Poster name when present. |
64
+ | `lda_lobbying_causes` | Repeated **once per** `lobbying_activities[]` row (`CODE (Display)`). Same pattern as patent **`cpc_code`**: narrative text is quad attribute **`lda_lobbying_cause_description`** on that atom (API `description` field). **Only on filing**; omitted if activities array is empty. |
65
+
66
+ ### Organization
67
+
68
+ | Property | Description |
69
+ |----------|-------------|
70
+ | `lda_party_role` | `registrant` or `client`. |
71
+ | `lda_registrant_id` | Registrant API id as string. |
72
+ | `lda_client_internal_id` | Client row id (`client.id`) as string. |
73
+ | `address` | Single-line formatted address (street/city/state/zip + country). |
74
+
75
+ ---
76
+
77
+ ## Entity Relationships
78
+
79
+ ```
80
+ lda_filing ──[lda_registrant]──→ organization (registrant)
81
+ lda_filing ──[lda_client]──────→ organization (client)
82
+
83
+ organization ──[is_located_at]──→ location
84
+ ```
85
+
86
+ - **`lda_registrant` / `lda_client`:** Target atoms on the **`lda_filing`** record point at the same organization identities emitted as separate **organization** records for that page (strong ids + properties).
87
+ - **`is_located_at`:** On each **organization** record when the atomizer can derive a location name from city/state/country rules.
88
+
89
+ ---
90
+
91
+ ## Records Per Filing
92
+
93
+ For a typical filing with registrant and client, atomization yields **up to three** records: one filing, one registrant organization, one client organization. Either org may be omitted if required API fields are missing.
94
+
95
+ ---
96
+
97
+ ## Citations
98
+
99
+ Primary citation text is the filing `url` when present; otherwise a synthetic label referencing `filing_uuid`.
100
+
101
+ ---
@@ -0,0 +1,209 @@
1
+ # Dataset schema for U.S. Lobbying Disclosure Act (LDA) filings from the
2
+ # Congress/Senate unified REST API (https://lda.gov/api/).
3
+ #
4
+ # Atomizer output uses Record.Source "lda". Filings are passive filing entities
5
+ # keyed by lda_filing_uuid; registrant and client are organization entities linked
6
+ # from the filing via lda_registrant / lda_client relationships.
7
+ name: "lda"
8
+ description: "Lobbying Disclosure Act LD-1/LD-2 filings (registrations and quarterly activity) from the LDA.gov REST API"
9
+
10
+ extraction:
11
+ flavors: closed
12
+ properties: closed
13
+ relationships: closed
14
+ attributes: closed
15
+ events: closed
16
+
17
+ flavors:
18
+ - name: "lda_filing"
19
+ description: "A lobbying disclosure (registration or quarterly activity) identified by filing_uuid"
20
+ display_name: "LDA filing"
21
+ mergeability: not_mergeable
22
+ strong_id_properties: ["lda_filing_uuid"]
23
+ passive: true
24
+
25
+ - name: "organization"
26
+ description: "A particular business, institution, or organization such as a corporation, university, government agency, or non-profit"
27
+ display_name: "Organization"
28
+ mergeability: not_mergeable
29
+ strong_id_properties: ["lda_registrant_id", "lda_client_internal_id"]
30
+ passive: true
31
+
32
+ - name: "location"
33
+ description: "A specific named geographic location such as a city, country, region, or landmark"
34
+ display_name: "Location"
35
+ mergeability: not_mergeable
36
+ examples: ["Washington, DC, US", "Arlington, VA, US"]
37
+ passive: true
38
+
39
+ properties:
40
+ - name: "lda_filing_uuid"
41
+ type: string
42
+ description: "Stable UUID of the filing in the LDA REST API"
43
+ display_name: "LDA Filing UUID"
44
+ mergeability: not_mergeable
45
+ domain_flavors: ["lda_filing"]
46
+ passive: true
47
+
48
+ - name: "lda_filing_type"
49
+ type: string
50
+ description: "Machine filing type code (e.g. RR, Q1) from the API filing_type field; human label is attribute lda_filing_type_display when present"
51
+ display_name: "LDA Filing Type Code"
52
+ mergeability: not_mergeable
53
+ domain_flavors: ["lda_filing"]
54
+ passive: true
55
+
56
+ - name: "lda_filing_year"
57
+ type: float
58
+ description: "Reporting year associated with the filing"
59
+ display_name: "LDA Filing Year"
60
+ mergeability: not_mergeable
61
+ domain_flavors: ["lda_filing"]
62
+ passive: true
63
+
64
+ - name: "lda_filing_period"
65
+ type: string
66
+ description: "Reporting period code from filing_period (string or number in JSON); human label is attribute lda_filing_period_display when present"
67
+ display_name: "LDA Filing Period Code"
68
+ mergeability: not_mergeable
69
+ domain_flavors: ["lda_filing"]
70
+ passive: true
71
+
72
+ - name: "lda_income"
73
+ type: string
74
+ description: "Income amount reported on the filing when present (API decimal as string)"
75
+ display_name: "LDA Reported Income"
76
+ mergeability: not_mergeable
77
+ domain_flavors: ["lda_filing"]
78
+ passive: true
79
+
80
+ - name: "lda_expenses"
81
+ type: string
82
+ description: "Expenses amount reported on the filing when present (API decimal as string)"
83
+ display_name: "LDA Reported Expenses"
84
+ mergeability: not_mergeable
85
+ domain_flavors: ["lda_filing"]
86
+ passive: true
87
+
88
+ - name: "lda_dt_posted"
89
+ type: string
90
+ description: "Date and time the filing was posted (dt_posted), ISO-8601 string from the API"
91
+ display_name: "LDA Date Posted"
92
+ mergeability: not_mergeable
93
+ domain_flavors: ["lda_filing"]
94
+ passive: true
95
+
96
+ - name: "lda_filing_document_url"
97
+ type: string
98
+ description: "Public URL of the filing document (HTML/PDF) from filing_document_url"
99
+ display_name: "LDA Filing Document URL"
100
+ mergeability: not_mergeable
101
+ domain_flavors: ["lda_filing"]
102
+ passive: true
103
+
104
+ - name: "lda_posted_by_name"
105
+ type: string
106
+ description: "Name of the individual who posted the filing (posted_by_name)"
107
+ display_name: "LDA Posted By"
108
+ mergeability: not_mergeable
109
+ domain_flavors: ["lda_filing"]
110
+ passive: true
111
+
112
+ - name: "lda_lobbying_causes"
113
+ type: string
114
+ description: >-
115
+ One atom per lobbying_activities row: "CODE (Display)" from general_issue_code and
116
+ general_issue_code_display. Narrative text is attribute lda_lobbying_cause_description on the
117
+ same atom when present (same pattern as patent cpc_code + cpc_description).
118
+ display_name: "LDA Lobbying Causes"
119
+ mergeability: not_mergeable
120
+ domain_flavors: ["lda_filing"]
121
+ examples:
122
+ - "BUD (Budget/Appropriations)"
123
+ - "HCR (Health Issues)"
124
+ passive: true
125
+
126
+ - name: "lda_party_role"
127
+ type: string
128
+ description: "Whether this organization row is the lobbying registrant or the client on the linked LDA filing"
129
+ display_name: "LDA Party Role"
130
+ mergeability: not_mergeable
131
+ domain_flavors: ["organization"]
132
+ examples: ["registrant", "client"]
133
+ passive: true
134
+
135
+ - name: "lda_registrant_id"
136
+ type: string
137
+ description: "Stable LDA API registrant id (registrant.id) as a string, for entity resolution"
138
+ display_name: "LDA Registrant ID"
139
+ mergeability: not_mergeable
140
+ domain_flavors: ["organization"]
141
+ passive: true
142
+
143
+ - name: "lda_client_internal_id"
144
+ type: string
145
+ description: "Stable LDA API internal client row id (client.id) as a string, for entity resolution"
146
+ display_name: "LDA Client Internal ID"
147
+ mergeability: not_mergeable
148
+ domain_flavors: ["organization"]
149
+ passive: true
150
+
151
+ - name: "address"
152
+ type: string
153
+ description: "Physical street address of the entity"
154
+ display_name: "Address"
155
+ mergeability: not_mergeable
156
+ domain_flavors: ["organization"]
157
+ passive: true
158
+
159
+ relationships:
160
+ - name: "lda_registrant"
161
+ description: "The lobbying registrant firm that filed this LDA disclosure"
162
+ display_name: "LDA Registrant"
163
+ mergeability: not_mergeable
164
+ domain_flavors: ["lda_filing"]
165
+ target_flavors: ["organization"]
166
+ passive: true
167
+
168
+ - name: "lda_client"
169
+ description: "The client organization on whose behalf lobbying is reported for this LDA filing"
170
+ display_name: "LDA Client"
171
+ mergeability: not_mergeable
172
+ domain_flavors: ["lda_filing"]
173
+ target_flavors: ["organization"]
174
+ passive: true
175
+
176
+ - name: "is_located_at"
177
+ description: "An entity is located at, operates in, resides in, is headquartered in, was born in, visits, or died in a location"
178
+ display_name: "Located At"
179
+ mergeability: not_mergeable
180
+ domain_flavors: ["organization"]
181
+ target_flavors: ["location"]
182
+ passive: true
183
+
184
+ attributes:
185
+ - property: "lda_filing_period"
186
+ name: "lda_filing_period_display"
187
+ type: string
188
+ description: "Human-readable reporting period label from filing_period_display on the same atom as the period code"
189
+ display_name: "LDA Filing Period"
190
+ mergeability: not_mergeable
191
+
192
+ - property: "lda_filing_type"
193
+ name: "lda_filing_type_display"
194
+ type: string
195
+ description: "Human-readable filing type label from filing_type_display on the same atom as the type code"
196
+ display_name: "LDA Filing Type"
197
+ mergeability: not_mergeable
198
+
199
+ # Narrative for one lobbying issue row. Stored as a quad attribute on each lda_lobbying_causes atom.
200
+ - property: "lda_lobbying_causes"
201
+ name: "lda_lobbying_cause_description"
202
+ type: string
203
+ description: >-
204
+ lobbying_activities[].description for that row (optional trailing posted timestamp stripped).
205
+ Omitted as an attribute when the description is empty.
206
+ display_name: "LDA Lobbying Cause Description"
207
+ mergeability: not_mergeable
208
+
209
+ events: []