@yottagraph-app/data-model-skill 0.0.31 → 0.0.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1075 @@
1
+ # Data Dictionary: USAspending.gov
2
+
3
+ Last updated: 2026-05-20 (revised after Deep Research dictionary review rounds 1 and 2)
4
+
5
+ > **Operational runbooks** (you'll be sent here from log/alert messages):
6
+ >
7
+ > - `PaginationTruncated` metric / "paginated query truncated upstream"
8
+ > error log → runbook is the doc comment on
9
+ > `recordPaginationTruncation` in
10
+ > `moongoose/fetch/usaspending_streamer.go`. It decodes the
11
+ > `(mode, cell, reason)` attributes and lists per-mode mitigations.
12
+ > - "Reconnaissance: bulk-subaward path correction" (2026-05-23, Late PM)
13
+ > entry in `GUIDANCE_LOG.md` covers why backfill_subawards uses
14
+ > per-month slicing today and the A9.5d project to migrate it to the
15
+ > asynchronous bulk-download endpoint.
16
+
17
+ ## Purpose / Source Overview
18
+
19
+ USAspending.gov is the official open-data portal for federal government
20
+ spending, published by the U.S. Treasury's Bureau of the Fiscal Service.
21
+ It exposes every federal contract, grant, loan, direct payment, and
22
+ subaward awarded since FY2008, with daily updates pulled from FPDS
23
+ (procurement) and FABS (financial assistance) source systems.
24
+
25
+ v0 scope ingests three award categories plus their transactions and
26
+ subawards:
27
+
28
+ - **Contracts** (procurement) — types `A` (BPA Call), `B` (Purchase
29
+ Order), `C` (Delivery Order), `D` (Definitive Contract).
30
+ - **IDVs** (indefinite-delivery vehicles, parent contracts under which
31
+ child contracts are placed) — types `IDV_A` (GWAC), `IDV_B` (IDC),
32
+ `IDV_B_A` / `IDV_B_B` / `IDV_B_C` (IDC sub-types), `IDV_C` (FSS),
33
+ `IDV_D` (BOA), `IDV_E` (BPA).
34
+ - **Grants** (assistance) — types `02` (Block Grant), `03` (Formula
35
+ Grant), `04` (Project Grant), `05` (Cooperative Agreement), `F001`
36
+ (Grant, modern FABS code), `F002` (Cooperative Agreement, modern FABS
37
+ code).
38
+ - **Transactions** — every modification (FPDS modification or FABS
39
+ action) of an in-scope award, emitted as a sub-record linked to its
40
+ parent.
41
+ - **Subawards** — sub-recipient relationships under in-scope prime
42
+ contracts and grants (FFATA / FSRS reporting). v0 ingests subawards
43
+ via the **bulk CSV archives** rather than `/api/v2/subawards/`,
44
+ because the API list response omits the sub-recipient UEI and DUNS;
45
+ the bulk subaward archives include them.
46
+
47
+ Recipients are identified by **UEI** (Unique Entity Identifier; the
48
+ 12-character alphanumeric SAM.gov identifier that replaced DUNS in
49
+ April 2022). Federal agencies are identified by **CGAC toptier code**
50
+ (3-digit) and **subtier code** (4-character alphanumeric). Subordinate
51
+ classifications use **NAICS** (industry, contracts only), **PSC**
52
+ (product/service code, contracts only), and **assistance listings**
53
+ (formerly CFDA, grants only) — each a distinct entity flavor.
54
+
55
+ ### Ingestion pipeline (v0)
56
+
57
+ A hybrid pipeline is required because of API rate limits, pagination
58
+ ceilings, and the subaward UEI gap:
59
+
60
+ | Phase | Source | Cadence | Purpose |
61
+ |-------|--------|---------|---------|
62
+ | Initial backfill — awards (FY2008 → present) | Bulk Award CSV archives at `https://files.usaspending.gov/award_data_archive/` — per (agency × FY × type) ZIPs listed by `/api/v2/bulk_download/list_monthly_files/` | One-time on cold start, then never repeated for that snapshot | Full historical seed for prime contracts, IDVs, and grants. Carries the same fields as the API `/api/v2/awards/{id}/` detail. |
63
+ | Initial backfill — subawards (FY2008 → present) | **Today:** `/api/v2/search/spending_by_award/` with `subawards=true`, partitioned (FY × group × calendar-month) to stay under the API's 100k-row hard cap. **Planned (A9.5d):** asynchronous `/api/v2/bulk_download/awards/` jobs with `sub_award_types=[procurement,grant]`, which return per-cell ZIPs (`All_Contracts_Subawards_*.csv` / `All_Assistance_Subawards_*.csv`) with no row cap and the full `subawardee_uei`/`subawardee_duns` fidelity. See `GUIDANCE_LOG.md` round-1 finding and the A9.5d entry. | One-time | Full historical subaward seed. **NOTE:** earlier revisions of this document claimed static monthly `Contracts_Subawards.csv` / `Assistance_Subawards.csv` files exist at `files.usaspending.gov` — they do not; reconnaissance against the live API and the USAspending UI's "Award Data Archive" download list confirms only an asynchronous job-creation endpoint provides bulk subaward CSVs. |
64
+ | Daily delta (subaward) | Same search-API path as backfill, restricted to a `lookbackDays` window. Daily volumes (~hundreds–low-thousands of subawards) stay well under the 100k cap without month slicing. Planned A9.5d migration will switch this to a daily bulk_download job too. | Daily | Subaward UEI fidelity gap exists today on this path (the search API list response omits `subawardee_uei` / `subawardee_duns`) and will be closed by A9.5d. |
65
+ | Daily delta (awards + transactions) | `/api/v2/search/spending_by_award/` filtered on `last_modified_date >= now - 30d`, then `/api/v2/awards/{generated_unique_award_id}/` for full detail and `/api/v2/transactions/?award_id=…` for modification history. | Daily | Incremental updates and corrections (~10k-30k records per day). |
66
+ | Reference data | `/api/v2/references/toptier_agencies/`, `/api/v2/references/naics/`, `/api/v2/references/assistance_listing/`, `/api/v2/references/filter_tree/psc/` | Weekly | Agency, NAICS, PSC, and assistance-listing reference catalogs. |
67
+
68
+ The API enforces two limits that make API-only backfill infeasible: a
69
+ ~200 req/min rate ceiling, and a hard 10,000-record pagination wall on
70
+ search endpoints. The bulk-CSV-first strategy avoids both.
71
+
72
+ Anonymous API access is permitted with no key required, subject to the
73
+ rate limits above. Bulk CSV downloads are anonymous as well.
74
+
75
+ **Source names used on records:**
76
+
77
+ | Pipeline | `Record.Source` |
78
+ |----------|----------------|
79
+ | Reference data (agency lists, NAICS/PSC/assistance-listing catalogs) | `usaspending` |
80
+ | Contract awards + their transactions | `usaspending_contract` |
81
+ | IDV awards + their transactions | `usaspending_idv` |
82
+ | Grant awards + their transactions | `usaspending_grant` |
83
+ | Subawards under prime contracts or grants | `usaspending_subaward` |
84
+
85
+ ### Future scope (deferred, not in v0)
86
+
87
+ - Award categories: Loans (`07`, `08`, `F003`, `F004`), Direct Payments
88
+ (`06`, `10`, `F006`, `F007`), Other Financial Assistance (`09`, `11`,
89
+ `F005`, `F008`, `F009`, `F010`).
90
+ - Federal Account / Treasury Account Symbol (TAS) data — File C funding
91
+ linkage, including `total_account_obligation`,
92
+ `total_account_outlay`, and per-TAS breakdowns from
93
+ `/api/v2/awards/funding`.
94
+ - Disaster / DEFC-tagged spending (COVID DEFC `M`/`N`/`O`/`P`/`Q`/`R`,
95
+ IIJA, IRA) via `account_obligations_by_defc` /
96
+ `account_outlays_by_defc`.
97
+ - SAM.gov vendor registration + exclusions (separate Tier-1 source,
98
+ same UEI strong ID).
99
+ - Federally Negotiated Indirect Cost Rates published per OMB
100
+ Memorandum M-21-03 (when USAspending begins surfacing them in API
101
+ responses).
102
+ - PDF contract attachments via FPDS.
103
+ - IDV → child contract hierarchy via `/api/v2/idvs/awards/`.
104
+ - Cross-source resolution to EDGAR via UEI → CIK lookup (via SAM.gov
105
+ registration files) and to GLEIF via UEI → LEI lookup.
106
+
107
+ ---
108
+
109
+ ## Entity Types
110
+
111
+ ### `organization`
112
+
113
+ Either a **recipient** (the company, university, state agency, or
114
+ non-profit that receives the award), a **federal agency** (the awarding
115
+ or funding entity at toptier or subtier level), or a **sub-recipient**
116
+ (the entity receiving a subaward under a prime).
117
+
118
+ - **Primary key (recipient or sub-recipient):** `uei` (12-character
119
+ alphanumeric, the current authoritative federal vendor identifier).
120
+ - **Primary key (agency):**
121
+ `usaspending_toptier_agency_code` (CGAC 3-digit) for toptier
122
+ agencies; `usaspending_subtier_agency_code` (4-char alphanumeric)
123
+ for subtier agencies.
124
+ - **Entity resolver:** named entity, not mergeable at the flavor level
125
+ (passive). Strong IDs (`uei`, `duns`,
126
+ `usaspending_toptier_agency_code`, `usaspending_subtier_agency_code`)
127
+ drive deterministic merging within this dataset and cross-source
128
+ resolution to other datasets that publish the same identifiers (e.g.,
129
+ EDGAR's CIK + UEI bridge, GLEIF's LEI, SAM.gov's UEI + exclusions
130
+ list, future).
131
+ - **Sources:** `usaspending`, `usaspending_contract`,
132
+ `usaspending_idv`, `usaspending_grant`, `usaspending_subaward`.
133
+
134
+ ### `usaspending::contract`
135
+
136
+ A federal procurement contract award (type code `A`/`B`/`C`/`D`),
137
+ representing the canonical rolled-up state of an award across all its
138
+ modifications.
139
+
140
+ - **Primary key:** `generated_unique_award_id` (e.g.,
141
+ `CONT_AWD_HT940216C0001_9700_-NONE-_-NONE-`).
142
+ - **Entity resolver:** passive, not mergeable. Strong IDs are
143
+ `generated_unique_award_id` (primary, structured string) and
144
+ `usaspending_internal_id` (secondary, integer database key for
145
+ alias-tolerant routing). Disambiguation context: PIID, awarding
146
+ agency, recipient name.
147
+ - **Sources:** `usaspending_contract`.
148
+
149
+ ### `usaspending::idv`
150
+
151
+ An Indefinite-Delivery Vehicle — a parent procurement contract
152
+ (GWAC/IDC/FSS/BOA/BPA) under which individual delivery orders or task
153
+ orders are placed. Same shape as contracts but with `category=idv` and
154
+ a distinct flavor for queryability.
155
+
156
+ - **Primary key:** `generated_unique_award_id` (e.g.,
157
+ `CONT_IDV_NNJ16GX08B_8000`).
158
+ - **Entity resolver:** passive, not mergeable. Strong IDs:
159
+ `generated_unique_award_id`, `usaspending_internal_id`.
160
+ - **Sources:** `usaspending_idv`.
161
+
162
+ ### `usaspending::grant`
163
+
164
+ A federal financial assistance award classified as a grant: block
165
+ grant, formula grant, project grant, or cooperative agreement
166
+ (award-type codes `02`, `03`, `04`, `05`, `F001`, `F002`).
167
+
168
+ - **Primary key:** `generated_unique_award_id` (e.g.,
169
+ `ASST_NON_2505CA5MAP_075`).
170
+ - **Entity resolver:** passive, not mergeable. Strong IDs:
171
+ `generated_unique_award_id`, `usaspending_internal_id`.
172
+ Disambiguation context: FAIN, awarding agency, recipient name,
173
+ assistance listing number.
174
+ - **Sources:** `usaspending_grant`.
175
+
176
+ ### `usaspending::transaction`
177
+
178
+ A single FPDS or FABS modification applied to an in-scope award. Each
179
+ transaction has its own dollar amount, action date, modification
180
+ number, and description. The canonical award entity aggregates all
181
+ transactions into rolled-up totals; transaction sub-records preserve
182
+ modification history.
183
+
184
+ - **Primary key:** `transaction_unique_id` (e.g.,
185
+ `CONT_TX_9700_-NONE-_HT940216C0001_P00713_-NONE-_0`).
186
+ - **Entity resolver:** passive, not mergeable. Strong ID =
187
+ `transaction_unique_id`.
188
+ - **Sources:** `usaspending_contract`, `usaspending_idv`,
189
+ `usaspending_grant` (transaction sub-records share the source of
190
+ their parent award).
191
+
192
+ ### `usaspending::subaward`
193
+
194
+ A subaward (subcontract under a prime contract, or subgrant under a
195
+ prime grant), reported by the prime recipient under FFATA/FSRS
196
+ requirements. Captures the one-remove flow of federal dollars from the
197
+ prime recipient to its sub-recipient.
198
+
199
+ - **Primary key:** `usaspending_subaward_id` (canonical primary key
200
+ from the FSRS database; globally unique). Subject name:
201
+ `subaward_{usaspending_subaward_id}` (e.g., `subaward_797093`).
202
+ - **Entity resolver:** passive, not mergeable. Strong ID =
203
+ `usaspending_subaward_id`.
204
+ - **Sources:** `usaspending_subaward`. **Today** the input is
205
+ `/api/v2/search/spending_by_award/` with `subawards=true`; this
206
+ endpoint omits `subawardee_uei` / `subawardee_duns` from its list
207
+ response, so sub-recipient strong-IDs are partial. **Planned
208
+ (A9.5d):** migrate to bulk subaward CSVs obtained via the
209
+ asynchronous `/api/v2/bulk_download/awards/` job endpoint
210
+ (`All_Contracts_Subawards_*.csv` / `All_Assistance_Subawards_*.csv`),
211
+ which DO carry `subawardee_uei` / `subawardee_duns`. The
212
+ `/api/v2/subawards/` endpoint is informational only and not used for
213
+ ingestion.
214
+
215
+ ### `industry`
216
+
217
+ A NAICS-coded industrial classification. NAICS describes the
218
+ industrial capacity of the vendor or recipient.
219
+
220
+ - **Primary key:** `naics_code` (6-digit numeric string).
221
+ - **Entity resolver:** passive, not mergeable. Strong ID =
222
+ `naics_code`.
223
+ - **Sources:** `usaspending`, `usaspending_contract`,
224
+ `usaspending_idv`.
225
+
226
+ ### `product_service`
227
+
228
+ A PSC-coded product or service classification. PSC describes the
229
+ specific product or service the federal government purchased; it is
230
+ federal-procurement-specific and orthogonal to NAICS.
231
+
232
+ - **Primary key:** `psc_code` (4-character alphanumeric).
233
+ - **Entity resolver:** passive, not mergeable. Strong ID = `psc_code`.
234
+ - **Sources:** `usaspending`, `usaspending_contract`,
235
+ `usaspending_idv`.
236
+
237
+ ### `federal_program`
238
+
239
+ A statutory federal assistance program identified by an Assistance
240
+ Listing number (formerly CFDA). Authorized by Congress and used to
241
+ classify grants and other financial assistance.
242
+
243
+ - **Primary key:** `assistance_listing_number` (e.g., `93.778`).
244
+ - **Entity resolver:** passive, not mergeable. Strong ID =
245
+ `assistance_listing_number`.
246
+ - **Sources:** `usaspending`, `usaspending_grant`.
247
+
248
+ ### `location`
249
+
250
+ A geographic place — typically a US city + state, or a foreign country.
251
+ Created for both the recipient's headquarters address and the place of
252
+ performance of an award.
253
+
254
+ - **Primary key:** named entity (no strong ID). Subject name is built
255
+ by concatenating populated tokens of `[city, state, country]` in
256
+ order, skipping any token that is null or empty. Examples:
257
+ `"LOUISVILLE, KY, USA"`, `"CA, USA"` (city missing),
258
+ `"USA"` (city and state missing). If all three are null, no location
259
+ entity is emitted.
260
+ - **Entity resolver:** mergeable named entity (no strong ID), enabling
261
+ soft clustering with locations from other datasets (LDA, FDIC, EDGAR).
262
+ - **Sources:** `usaspending_contract`, `usaspending_idv`,
263
+ `usaspending_grant`, `usaspending_subaward`.
264
+
265
+ ### `person`
266
+
267
+ A senior executive of a recipient organization. USAspending publishes
268
+ the top-5 compensated officers of recipients that meet FFATA
269
+ executive-compensation reporting thresholds (companies receiving more
270
+ than $25M in federal awards in the prior fiscal year with >80%
271
+ revenue from federal contracts/grants).
272
+
273
+ - **Primary key:** named entity (no strong ID); subject name = officer
274
+ full name. Officers with null `name` are skipped at atomization
275
+ time (the API frequently emits placeholder null entries to pad the
276
+ officer array to length 5).
277
+ - **Entity resolver:** mergeable named entity, no strong ID
278
+ (USAspending does not provide a person ID).
279
+ - **Sources:** `usaspending_contract`, `usaspending_idv`,
280
+ `usaspending_grant`.
281
+
282
+ ---
283
+
284
+ ## Properties
285
+
286
+ ### Organization Properties
287
+
288
+ Data sources: USAspending Award Detail endpoint
289
+ (`/api/v2/awards/{award_id}/`) `recipient` and `awarding_agency` /
290
+ `funding_agency` sub-objects; Bulk Subaward CSV columns
291
+ (`subawardee_*`); References Agency endpoints
292
+ (`/api/v2/references/toptier_agencies/`,
293
+ `/api/v2/agency/{toptier_code}/`).
294
+
295
+ #### Recipient and Sub-Recipient Identity
296
+
297
+ * `uei`
298
+ * 12-character Unique Entity Identifier assigned by SAM.gov, the
299
+ federal vendor's canonical identifier since April 2022. Strong ID
300
+ for cross-source resolution. Same property name and namespace for
301
+ both prime recipients and sub-recipients.
302
+ * Examples: `"ZE6ZM6NKSV43"` (Humana Government Business),
303
+ `"FYHNA5WC8XD7"` (Lockheed Martin Corp), `"JE73CDQUAPA7"` (CA Dept
304
+ of Health Care Services).
305
+ * Derivation: `recipient.recipient_uei` from award detail for prime
306
+ recipients; `subawardee_uei` column from the bulk Subaward CSV for
307
+ sub-recipients.
308
+
309
+ * `duns`
310
+ * 9-digit Dun & Bradstreet DUNS number, the legacy federal vendor
311
+ identifier sunset in April 2022. Retained as a secondary strong ID
312
+ for historical merging of pre-2022 records that were never
313
+ cross-walked to a UEI.
314
+ * Examples: `"123456789"`.
315
+ * Derivation: `recipient.recipient_unique_id` from award detail or
316
+ `subawardee_duns` column from bulk Subaward CSV. Null for awards
317
+ or subawards signed after April 2022.
318
+
319
+ * `parent_recipient_uei`
320
+ * UEI of the recipient's parent corporate entity (ultimate parent in
321
+ the SAM.gov hierarchy). Used to construct the `is_subsidiary_of`
322
+ relationship. **Caveat:** this field reflects the parent-child
323
+ relationship as recorded in SAM.gov at the time the award action
324
+ was reported; SAM.gov hierarchy data lags real-world M&A activity
325
+ by months to years, so `parent_recipient_uei` should be treated as
326
+ a historical / point-in-time signal rather than a current truth.
327
+ * Examples: `"ZE6ZM6NKSV43"` (parent of itself for top-level orgs),
328
+ `"H8KJK8BFQXY6"` (corporate parent).
329
+ * Derivation: `recipient.parent_recipient_uei` from award detail.
330
+
331
+ * `business_categories`
332
+ * Repeated property; one atom per category the recipient is
333
+ classified as for federal procurement / assistance purposes.
334
+ * Examples: `"Corporate Entity Not Tax Exempt"`,
335
+ `"U.S.-Owned Business"`, `"Small Business"`,
336
+ `"Minority Owned Business"`, `"Government"`,
337
+ `"U.S. Regional/State Government"`.
338
+ * Derivation: each element of `recipient.business_categories` array
339
+ from award detail; for sub-recipients, the `subawardee_business_types`
340
+ column from the bulk Subaward CSV (semicolon-delimited list).
341
+
342
+ * `physical_address`
343
+ * Recipient headquarters street address formatted as a single
344
+ string.
345
+ * Examples: `"500 W MAIN STREET, LOUISVILLE, KY 40202"`.
346
+ * Derivation: concatenation of `recipient.location.address_line1`,
347
+ `city_name`, `state_code`, `zip5` from award detail. For
348
+ sub-recipients, concatenation of corresponding `subawardee_*`
349
+ columns from the bulk Subaward CSV. Omitted when the underlying
350
+ address line is null.
351
+
352
+ #### Federal Agency Identity (source: `usaspending`)
353
+
354
+ * `usaspending_toptier_agency_code`
355
+ * CGAC (Common Government-wide Accounting Code) 3-digit numeric
356
+ identifier of a toptier federal agency (cabinet-level department
357
+ or independent agency). Strong ID.
358
+ * Examples: `"097"` (DOD), `"075"` (HHS), `"070"` (DHS), `"080"`
359
+ (NASA).
360
+ * Derivation: `code` from any of these JSON paths:
361
+ `awarding_agency.toptier_agency` or `funding_agency.toptier_agency`
362
+ in award detail (the pipeline must traverse both to instantiate
363
+ both awarding-side and funding-side toptier organizations), or
364
+ each entry of `/api/v2/references/toptier_agencies/` for the
365
+ reference catalog refresh. The `toptier_agency` object does not
366
+ appear at the JSON root of award detail; it is always nested
367
+ under `awarding_agency` or `funding_agency`.
368
+
369
+ * `usaspending_subtier_agency_code`
370
+ * 4-character alphanumeric identifier of a subtier agency
371
+ (sub-component of a toptier agency). Strong ID.
372
+ * Examples: `"97DH"` (Defense Health Agency under DOD), `"7530"`
373
+ (CMS under HHS).
374
+ * Derivation: `subtier_agency.code` from award detail.
375
+
376
+ * `usaspending_agency_abbreviation`
377
+ * Common abbreviation for the agency.
378
+ * Examples: `"DOD"`, `"DHA"`, `"HHS"`, `"CMS"`, `"NASA"`.
379
+ * Derivation: `toptier_agency.abbreviation` or
380
+ `subtier_agency.abbreviation` from award detail.
381
+
382
+ * `usaspending_agency_slug`
383
+ * URL-safe slug for the agency, suitable for deep-linking back to
384
+ USAspending.gov agency pages.
385
+ * Examples: `"department-of-defense"`,
386
+ `"department-of-health-and-human-services"`.
387
+ * Derivation: `toptier_agency.slug` from award detail.
388
+
389
+ * `agency_role`
390
+ * Distinguishes federal-agency `organization` records from recipient
391
+ `organization` records.
392
+ * Values: `"federal_agency_toptier"`, `"federal_agency_subtier"`.
393
+ * Derivation: synthesized at atomization time from whether the
394
+ record represents a toptier or subtier agency.
395
+
396
+ #### Executive Compensation (sources: `usaspending_contract`, `usaspending_idv`, `usaspending_grant`)
397
+
398
+ * `recipient_top_officer_compensation`
399
+ * Reported annual compensation in USD for a top-5 compensated
400
+ executive of the recipient (FFATA executive-compensation reporting).
401
+ Atom is dual-homed: the value lives on the `organization` record,
402
+ with a `person` sub-record carrying the officer name and the same
403
+ amount.
404
+ * Examples: `1409718.0`, `607266.0`.
405
+ * Derivation: `executive_details.officers[].amount` from award
406
+ detail. Officer entries where both `name` and `amount` are null
407
+ are skipped (the API pads the officers array to length 5 with
408
+ null entries for recipients that report fewer than 5 officers).
409
+ Null for recipients that do not meet FFATA thresholds at all.
410
+
411
+ ### Contract / IDV Award Properties
412
+
413
+ Data source: USAspending Award Detail endpoint
414
+ (`/api/v2/awards/{award_id}/`) for contracts (`category=contract`)
415
+ and IDVs (`category=idv`), plus the corresponding columns in the bulk
416
+ Award CSV archives during initial backfill.
417
+
418
+ #### Identifiers
419
+
420
+ * `generated_unique_award_id`
421
+ * Canonical USAspending award identifier (primary strong ID); a
422
+ structured string combining category, PIID/FAIN, and agency.
423
+ * Examples: `"CONT_AWD_HT940216C0001_9700_-NONE-_-NONE-"`,
424
+ `"CONT_IDV_NNJ16GX08B_8000"`, `"ASST_NON_2505CA5MAP_075"`.
425
+ * Derivation: `generated_unique_award_id` from award detail.
426
+ * Note: USAspending occasionally rewrites these structured IDs
427
+ (e.g., the July 2025 change that swapped subtier for toptier codes
428
+ in some grant IDs). The resolver uses both this property and
429
+ `usaspending_internal_id` as strong IDs to absorb such aliasing
430
+ without splitting entities.
431
+
432
+ * `usaspending_internal_id`
433
+ * Stringified integer database key from USAspending's internal
434
+ schema. Stable across `generated_unique_award_id` rewrites, and
435
+ accepted as `award_id` by detail endpoints that use either form.
436
+ Treated as a secondary strong ID for alias-tolerant routing.
437
+ * Examples: `"307885715"`, `"236536428"`.
438
+ * Derivation: `id` from award detail or `internal_id` from search
439
+ results.
440
+
441
+ * `piid`
442
+ * Procurement Instrument Identifier — the official FPDS contract
443
+ number, unique within an agency.
444
+ * Examples: `"HT940216C0001"`, `"NNJ16GX08B"`, `"DENA0003525"`.
445
+ * Derivation: `piid` from award detail.
446
+
447
+ * `parent_award_piid`
448
+ * PIID of the IDV under which this contract was awarded
449
+ (delivery/task orders only). Carried as a searchable property
450
+ only; **not** used as the topological anchor for the `child_of`
451
+ edge to the parent IDV, because the IDV's strong ID is its
452
+ `generated_unique_award_id`, not its PIID. Pointing the edge at
453
+ the PIID would result in a dangling reference.
454
+ * Examples: `"HHM402-15-D-0021"`.
455
+ * Derivation: `parent_award.piid` from award detail. Null for
456
+ standalone contracts.
457
+
458
+ * `parent_award_unique_id`
459
+ * `generated_unique_award_id` of the parent IDV under which this
460
+ contract was awarded (delivery/task orders only). This is the
461
+ field used to construct the `child_of` edge pointing from a
462
+ contract to its parent IDV.
463
+ * Examples: `"CONT_IDV_HHM40215D0021_9700"`.
464
+ * Derivation: `parent_award.generated_unique_award_id` from award
465
+ detail (verify the exact field name at streamer-implementation
466
+ time using a real delivery-order sample; `parent_award.award_id`
467
+ mapped to `usaspending_internal_id` is the documented fallback if
468
+ the string ID is unavailable). Null for standalone contracts.
469
+
470
+ * `award_type_code`
471
+ * Single-character (or short) award-type code.
472
+ * Values for contracts: `"A"`, `"B"`, `"C"`, `"D"`. Values for IDVs:
473
+ `"IDV_A"` through `"IDV_E"` (plus `"IDV_B_A"`, `"IDV_B_B"`,
474
+ `"IDV_B_C"`).
475
+ * Derivation: `type` from award detail.
476
+
477
+ * `award_type_description`
478
+ * Human-readable award-type label.
479
+ * Examples: `"DEFINITIVE CONTRACT"`, `"PURCHASE ORDER"`,
480
+ `"GWAC Government Wide Acquisition Contract"`.
481
+ * Derivation: `type_description` from award detail.
482
+
483
+ * `award_description`
484
+ * Free-text description of the contract scope of work.
485
+ * Examples: `"MANAGEMENT AND OPERATION OF THE OAK RIDGE NATIONAL
486
+ LABORATORY"`, `"IGF::OT::IGF"`.
487
+ * Derivation: `description` from award detail.
488
+
489
+ #### Financials
490
+
491
+ * `total_obligation`
492
+ * Aggregate federal obligation in USD across all transactions to
493
+ date. May decrease over time when transactions deobligate funds.
494
+ * Examples: `51269205263.03`, `42111665692.01`, `-2500000.0`
495
+ (rare but valid: net deobligation across modifications).
496
+ * Derivation: `total_obligation` from award detail.
497
+
498
+ * `base_and_all_options`
499
+ * Total potential contract value including all unexercised options,
500
+ in USD. Represents the ceiling of the award. Always `0.0` for many
501
+ IDV types (e.g., BPAs and BOAs) which are unfunded parent vehicles
502
+ whose value lives on their child delivery orders.
503
+ * Examples: `56620536577.19`, `0.0`.
504
+ * Derivation: `base_and_all_options` from award detail.
505
+
506
+ * `base_exercised_options`
507
+ * Total contract value of the base period plus options exercised to
508
+ date, in USD.
509
+ * Examples: `52641654181.19`.
510
+ * Derivation: `base_exercised_options` from award detail.
511
+
512
+ * `total_outlay`
513
+ * Cumulative outlays (cash payments) made against the award, in USD.
514
+ * Examples: `100066943940.16`.
515
+ * Derivation: `total_outlay` from award detail. Null when not yet
516
+ reported by the awarding agency.
517
+
518
+ #### Performance Period
519
+
520
+ * `award_start_date`
521
+ * Period-of-performance start date in YYYY-MM-DD format.
522
+ * Examples: `"2016-08-01"`, `"1999-10-15"`.
523
+ * Derivation: `period_of_performance.start_date` from award detail.
524
+
525
+ * `award_end_date`
526
+ * Period-of-performance current end date in YYYY-MM-DD format
527
+ (reflects exercised options).
528
+ * Examples: `"2025-12-31"`, `"2030-03-31"`.
529
+ * Derivation: `period_of_performance.end_date` from award detail.
530
+
531
+ * `award_potential_end_date`
532
+ * Period-of-performance potential end date including all options.
533
+ * Examples: `"2030-03-31"`.
534
+ * Derivation: `period_of_performance.potential_end_date` from award
535
+ detail.
536
+
537
+ * `award_last_modified_date`
538
+ * Date the award record was most recently updated in USAspending,
539
+ used as the streamer's incremental cursor.
540
+ * Examples: `"2026-02-10"`.
541
+ * Derivation: `period_of_performance.last_modified_date` from award
542
+ detail.
543
+
544
+ * `award_date_signed`
545
+ * Date the contract was originally signed.
546
+ * Examples: `"2016-07-29"`.
547
+ * Derivation: `date_signed` from award detail.
548
+
549
+ #### Classification
550
+
551
+ * `naics_code`
552
+ * 6-digit NAICS industry code assigned to the contract. Strong ID
553
+ for `industry` flavor; also stored as a property on the award.
554
+ * Examples: `"524114"`, `"561210"`, `"541512"`.
555
+ * Derivation: `latest_transaction_contract_data.naics` or
556
+ `naics_hierarchy.base_code.code` from award detail.
557
+
558
+ * `psc_code`
559
+ * 4-character Product/Service Code assigned to the contract. Strong
560
+ ID for `product_service` flavor; also stored as a property on the
561
+ award.
562
+ * Examples: `"Q201"`, `"M181"`, `"1555"`.
563
+ * Derivation:
564
+ `latest_transaction_contract_data.product_or_service_code` or
565
+ `psc_hierarchy.base_code.code` from award detail.
566
+
567
+ #### Procurement Procedure (sources: `usaspending_contract`, `usaspending_idv`)
568
+
569
+ These properties characterize the procedural and regulatory environment
570
+ under which a contract was awarded. They live on the contract / IDV
571
+ entity and are derived from `latest_transaction_contract_data` in the
572
+ award detail response.
573
+
574
+ * `solicitation_identifier`
575
+ * Identifier of the original solicitation (Request for Proposal /
576
+ Request for Quote) that led to this award. Enables lifecycle
577
+ tracking from RFP to execution.
578
+ * Examples: `"HT940215R0002"`.
579
+ * Derivation:
580
+ `latest_transaction_contract_data.solicitation_identifier`.
581
+
582
+ * `offers_received_count`
583
+ * Number of offers received in response to the solicitation. Key
584
+ competition metric.
585
+ * Examples: `4`, `1`.
586
+ * Derivation:
587
+ `latest_transaction_contract_data.number_of_offers_received`
588
+ (parsed from string to integer).
589
+
590
+ * `extent_competed_description`
591
+ * Description of the level of competition.
592
+ * Examples: `"FULL AND OPEN COMPETITION"`, `"NOT COMPETED"`,
593
+ `"FOLLOW ON TO COMPETED ACTION"`.
594
+ * Derivation:
595
+ `latest_transaction_contract_data.extent_competed_description`.
596
+
597
+ * `type_set_aside_description`
598
+ * Small-business set-aside designation, if any.
599
+ * Examples: `"NO SET ASIDE USED."`, `"SMALL BUSINESS SET ASIDE -
600
+ TOTAL"`, `"8A COMPETED"`.
601
+ * Derivation:
602
+ `latest_transaction_contract_data.type_set_aside_description`.
603
+
604
+ * `type_of_contract_pricing_description`
605
+ * Pricing structure of the contract.
606
+ * Examples: `"COST PLUS FIXED FEE"`, `"FIRM FIXED PRICE"`,
607
+ `"TIME AND MATERIALS"`.
608
+ * Derivation:
609
+ `latest_transaction_contract_data.type_of_contract_pricing_description`.
610
+
611
+ * `commercial_item_acquisition_type`
612
+ * Whether the government used FAR Part 12 commercial-item
613
+ procedures (vs. standard procurement). Key metric for defense
614
+ acquisition research.
615
+ * Examples: `"COMMERCIAL PRODUCTS/SERVICES PROCEDURES NOT USED"`,
616
+ `"COMMERCIAL PRODUCTS/SERVICES PROCEDURES USED"`.
617
+ * Derivation:
618
+ `latest_transaction_contract_data.commercial_item_acquisition_description`.
619
+
620
+ * `labor_standards_apply`
621
+ * Whether Service Contract Act or Davis-Bacon Act labor standards
622
+ apply to the contract.
623
+ * Values: `"YES"`, `"NO"`, occasionally null.
624
+ * Derivation:
625
+ `latest_transaction_contract_data.labor_standards_description`.
626
+
627
+ * `entity_ownership_type`
628
+ * Whether the prime contractor is US-owned or foreign-owned.
629
+ * Examples: `"U.S. OWNED BUSINESS"`, `"FOREIGN-OWNED BUSINESS NOT
630
+ INCORPORATED IN THE U.S."`.
631
+ * Derivation:
632
+ `latest_transaction_contract_data.domestic_or_foreign_entity_description`.
633
+
634
+ * `subcontracting_plan_type`
635
+ * Whether the prime contractor maintains an individual or commercial
636
+ subcontracting plan, or none.
637
+ * Examples: `"INDIVIDUAL SUBCONTRACT PLAN"`, `"COMMERCIAL
638
+ SUBCONTRACT PLAN"`, `"PLAN NOT REQUIRED"`.
639
+ * Derivation:
640
+ `latest_transaction_contract_data.subcontracting_plan_description`.
641
+
642
+ * `is_multi_year_contract`
643
+ * Whether the contract is a statutorily defined multi-year
644
+ procurement (vs. annual with options).
645
+ * Values: `"YES"`, `"NO"`.
646
+ * Derivation:
647
+ `latest_transaction_contract_data.multi_year_contract_description`.
648
+
649
+ #### Subaward Rollups
650
+
651
+ * `subaward_count`
652
+ * Total number of subawards reported under this prime contract.
653
+ * Examples: `145`, `0`.
654
+ * Derivation: `subaward_count` from award detail.
655
+
656
+ * `total_subaward_amount`
657
+ * Aggregate dollar value of subawards reported under this prime
658
+ contract, in USD.
659
+ * Examples: `1079551766.05`.
660
+ * Derivation: `total_subaward_amount` from award detail.
661
+
662
+ ### Grant Award Properties
663
+
664
+ Data source: USAspending Award Detail endpoint
665
+ (`/api/v2/awards/{award_id}/`) for grants (`category=grant`), plus the
666
+ corresponding columns in the bulk Award CSV archives during initial
667
+ backfill.
668
+
669
+ Grants share these common award properties with contracts/IDVs:
670
+ `generated_unique_award_id`, `usaspending_internal_id`,
671
+ `award_type_code`, `award_type_description`, `award_description`,
672
+ `total_obligation`, `total_outlay`, `award_start_date`,
673
+ `award_end_date`, `award_last_modified_date`, `award_date_signed`,
674
+ `subaward_count`, `total_subaward_amount`.
675
+
676
+ Grant-specific properties:
677
+
678
+ * `fain`
679
+ * Federal Award Identification Number — the canonical grant ID.
680
+ * Examples: `"2505CA5MAP"`, `"R01HL123456"`.
681
+ * Derivation: `fain` from award detail.
682
+
683
+ * `assistance_listing_number`
684
+ * Assistance Listing number (formerly CFDA number), identifying the
685
+ federal program the grant funds. Strong ID for `federal_program`
686
+ flavor; also stored as a property on the grant.
687
+ * Examples: `"93.778"` (Medicaid), `"84.027"` (Special Education),
688
+ `"81.087"` (Renewable Energy R&D).
689
+ * Derivation: `cfda_info[0].cfda_number` from award detail.
690
+
691
+ * `total_funding`
692
+ * Total funding amount including federal and non-federal share, in
693
+ USD. Equals `total_obligation + non_federal_funding`.
694
+ * Examples: `100096643196.0`.
695
+ * Derivation: `total_funding` from award detail.
696
+
697
+ * `non_federal_funding`
698
+ * Required non-federal cost-share contribution to the grant, in USD.
699
+ * Examples: `0.0`, `25000000.0`.
700
+ * Derivation: `non_federal_funding` from award detail.
701
+
702
+ * `funding_opportunity_number`
703
+ * Reference to the funding opportunity (Notice of Funding
704
+ Opportunity / NOFO) under which the grant was awarded. Often
705
+ `"NOT APPLICABLE"` for entitlement programs.
706
+ * Examples: `"HHS-2024-CMS-NHSN-001"`, `"NOT APPLICABLE"`.
707
+ * Derivation: `funding_opportunity.number` from award detail.
708
+
709
+ ### Transaction Properties
710
+
711
+ Data source: USAspending Transactions endpoint
712
+ (`/api/v2/transactions/?award_id={award_id}`), enumerated per parent
713
+ award (the endpoint accepts both the string `generated_unique_award_id`
714
+ and the integer `internal_id` as `award_id`).
715
+
716
+ * `transaction_unique_id`
717
+ * Canonical USAspending transaction identifier (strong ID).
718
+ * Examples: `"CONT_TX_9700_-NONE-_HT940216C0001_P00713_-NONE-_0"`.
719
+ * Derivation: `id` from transactions list response.
720
+
721
+ * `transaction_action_date`
722
+ * Date the modification action was effective, in YYYY-MM-DD.
723
+ * Examples: `"2026-02-10"`, `"2025-12-30"`.
724
+ * Derivation: `action_date` from transactions list.
725
+
726
+ * `transaction_action_type`
727
+ * Single-character code identifying the action type.
728
+ * Values: `"A"` (additional work / new contract), `"B"`
729
+ (supplemental), `"C"` (funding only action), `"D"` (change
730
+ order), etc.
731
+ * Derivation: `action_type` from transactions list.
732
+
733
+ * `transaction_action_type_description`
734
+ * Human-readable action-type label.
735
+ * Examples: `"CHANGE ORDER"`, `"FUNDING ONLY ACTION"`, `"DEFINITIVE
736
+ CONTRACT"`.
737
+ * Derivation: `action_type_description` from transactions list.
738
+
739
+ * `transaction_modification_number`
740
+ * Modification number assigned to this transaction (FPDS uses
741
+ sequential strings like `P00001`, `P00002`).
742
+ * Examples: `"P00713"`, `"M01"`, `"0"` (base award).
743
+ * Derivation: `modification_number` from transactions list.
744
+
745
+ * `transaction_description`
746
+ * Free-text description of the modification.
747
+ * Examples: `"MANAGED CARE SUPPORT SERVICES - EAST REGION"`,
748
+ `"OPTION YEAR 3"`.
749
+ * Derivation: `description` from transactions list.
750
+
751
+ * `transaction_federal_action_obligation`
752
+ * Dollar change to the federal obligation effected by this
753
+ transaction, in USD. Positive (additional funding), negative
754
+ (deobligation due to descoping or closeout), or zero
755
+ (administrative modification with no funding impact).
756
+ * Examples: `80000000.0`, `-2500000.0`, `0.0`.
757
+ * Derivation: `federal_action_obligation` from transactions list.
758
+
759
+ ### Subaward Properties
760
+
761
+ > **Implementation status (2026-05-23):** the property derivations
762
+ > below are the planned A9.5d shape sourced from bulk subaward CSVs
763
+ > (`All_Contracts_Subawards_*.csv` / `All_Assistance_Subawards_*.csv`)
764
+ > obtained via the asynchronous `/api/v2/bulk_download/awards/` job
765
+ > endpoint. **Today's implementation** sources these properties from
766
+ > `/api/v2/search/spending_by_award/?subawards=true` — which omits
767
+ > `subawardee_uei` and `subawardee_duns` from its list response, so
768
+ > sub-recipient strong-IDs are partial until A9.5d ships. See
769
+ > `GUIDANCE_LOG.md` "Reconnaissance: bulk-subaward path correction"
770
+ > (2026-05-23) for the migration plan.
771
+ >
772
+ > An earlier revision of this document claimed static monthly
773
+ > `Contracts_Subawards.csv` files exist at `files.usaspending.gov`.
774
+ > They do not — the USAspending "Award Data Archive" download page
775
+ > offers Contracts and Financial Assistance archive types only.
776
+ > Subaward CSVs are generated on demand via the async job endpoint.
777
+
778
+ Data source (planned, A9.5d): bulk subaward CSVs produced by
779
+ `POST /api/v2/bulk_download/awards/` with
780
+ `sub_award_types=[procurement, grant]`. CSV column names below are
781
+ verified against a real 2026-05-23 download. The
782
+ `/api/v2/subawards/?award_id=...` REST endpoint provides a small
783
+ subset of these fields but omits `subawardee_uei` / `subawardee_duns`,
784
+ so it is not used for ingestion.
785
+
786
+ * `usaspending_subaward_id`
787
+ * Internal numeric ID of the subaward record from the SAM Subaward
788
+ Reporting System (the successor to FSRS); globally unique.
789
+ Strong ID.
790
+ * Examples: `"797093"`, `"775895"`.
791
+ * Derivation: stringified `subaward_sam_report_id` column from the
792
+ bulk Subaward CSV. (The USAspending API exposes this same field
793
+ as `internal_id`; the column name `id` belongs to the REST API
794
+ response, not the bulk CSV. Verify the exact column header at
795
+ streamer-implementation time against a real CSV download.)
796
+
797
+ * `prime_award_unique_key`
798
+ * `generated_unique_award_id` of the prime award under which this
799
+ subaward was reported. The field used to construct the
800
+ `[under_prime]` edge from a subaward to its prime contract or
801
+ prime grant.
802
+ * Examples: `"CONT_AWD_HT940216C0001_9700_-NONE-_-NONE-"`,
803
+ `"ASST_NON_2505CA5MAP_075"`.
804
+ * Derivation: `prime_award_unique_key` column from the bulk
805
+ Subaward CSV. (Documented in
806
+ `usaspending_api/download/v2/download_column_historical_lookups.py`
807
+ in the upstream usaspending-api repo as the FFATA prime-link
808
+ column.)
809
+
810
+ * `subaward_number`
811
+ * Sub-recipient-assigned subaward number; not globally unique across
812
+ primes.
813
+ * Examples: `"WPS-16-C-0001"`, `"5028514"`.
814
+ * Derivation: `subaward_number` column from bulk Subaward CSV.
815
+
816
+ * `subaward_action_date`
817
+ * Date of the subaward action in YYYY-MM-DD.
818
+ * Examples: `"2016-08-02"`, `"2022-12-29"`.
819
+ * Derivation: `subaward_action_date` column from bulk Subaward CSV.
820
+
821
+ * `subaward_amount`
822
+ * Dollar value of the subaward, in USD.
823
+ * Examples: `486548157.0`, `120800889.2`.
824
+ * Derivation: `subaward_amount` column from bulk Subaward CSV.
825
+
826
+ * `subaward_description`
827
+ * Free-text description of the subaward scope.
828
+ * Examples: `"FISCAL INTERMEDIARY SERVICES FOR TRICARE 2017 EAST
829
+ REGION."`.
830
+ * Derivation: `subaward_description` column from bulk Subaward CSV.
831
+
832
+ ### Industry / Product Service / Federal Program Properties
833
+
834
+ Data source: derived during atomization from contract / IDV / grant
835
+ award detail; classification reference data from
836
+ `/api/v2/references/naics/`, `/api/v2/references/filter_tree/psc/`,
837
+ and `/api/v2/references/assistance_listing/`.
838
+
839
+ #### NAICS
840
+
841
+ * `naics_code`
842
+ * 6-digit NAICS industry code. Strong ID for `industry` flavor.
843
+ * Examples: `"524114"`, `"561210"`, `"541512"`.
844
+ * Derivation: same as award `naics_code` property; the value becomes
845
+ the strong ID of an `industry` entity.
846
+
847
+ * `naics_description`
848
+ * Human-readable NAICS title.
849
+ * Examples: `"DIRECT HEALTH AND MEDICAL INSURANCE CARRIERS"`,
850
+ `"FACILITIES SUPPORT SERVICES"`.
851
+ * Derivation: `latest_transaction_contract_data.naics_description`
852
+ or `naics_hierarchy.base_code.description` from award detail.
853
+
854
+ #### PSC
855
+
856
+ * `psc_code`
857
+ * 4-character PSC code. Strong ID for `product_service` flavor.
858
+ * Examples: `"Q201"`, `"M181"`, `"1555"`.
859
+ * Derivation: same as award `psc_code` property.
860
+
861
+ * `psc_description`
862
+ * Human-readable PSC label.
863
+ * Examples: `"MEDICAL- MANAGED HEALTHCARE"`, `"OPER OF GOVT R&D
864
+ GOCO FACILITIES"`.
865
+ * Derivation: `psc_hierarchy.base_code.description` from award
866
+ detail.
867
+
868
+ #### Assistance Listing (Federal Program)
869
+
870
+ * `assistance_listing_number`
871
+ * Assistance Listing (CFDA) number. Strong ID for `federal_program`
872
+ flavor.
873
+ * Examples: `"93.778"`, `"84.027"`.
874
+ * Derivation: same as grant `assistance_listing_number` property.
875
+
876
+ * `assistance_listing_title`
877
+ * Human-readable assistance program title.
878
+ * Examples: `"Grants to States for Medicaid"`.
879
+ * Derivation: `cfda_info[0].cfda_title` from grant award detail.
880
+
881
+ * `assistance_listing_applicant_eligibility`
882
+ * Description of which entities are legally eligible to apply for
883
+ this assistance program (e.g., state/local governments,
884
+ non-profits, individuals).
885
+ * Examples: `"State and local welfare agencies must operate under
886
+ an HHS-approved Medicaid State Plan..."`.
887
+ * Derivation: `cfda_info[0].applicant_eligibility` from grant award
888
+ detail.
889
+
890
+ * `assistance_listing_beneficiary_eligibility`
891
+ * Description of the ultimate end-users of the program funds.
892
+ * Examples: `"Low-income persons who are over age 65, blind or
893
+ disabled, members of families with dependent children..."`.
894
+ * Derivation: `cfda_info[0].beneficiary_eligibility` from grant
895
+ award detail.
896
+
897
+ * `assistance_listing_objectives`
898
+ * Description of the programmatic goals of the assistance program.
899
+ * Examples: `"To provide financial assistance to States for
900
+ payments of medical assistance on behalf of cash assistance
901
+ recipients..."`.
902
+ * Derivation: `cfda_info[0].cfda_objectives` from grant award
903
+ detail.
904
+
905
+ ### Location Properties
906
+
907
+ Data source: `recipient.location` and `place_of_performance`
908
+ sub-objects of award detail; corresponding `*_location_*` columns of
909
+ the bulk Subaward CSV.
910
+
911
+ * `location_country_code`
912
+ * ISO 3166-1 alpha-3 country code.
913
+ * Examples: `"USA"`, `"GBR"`, `"DEU"`.
914
+ * Derivation: `location_country_code` from recipient or
915
+ place_of_performance.
916
+
917
+ * `location_state_code`
918
+ * Two-character US state code; null for non-US locations.
919
+ * Examples: `"KY"`, `"CA"`, `"VA"`.
920
+ * Derivation: `state_code` from recipient or place_of_performance.
921
+
922
+ * `location_congressional_district`
923
+ * US congressional district code (state + 2-digit district number
924
+ is canonical; this field stores just the 2-digit district).
925
+ * Examples: `"03"`, `"01"`, `"AL"` (at-large).
926
+ * Derivation: `congressional_code` from recipient or
927
+ place_of_performance.
928
+
929
+ ### Person Properties
930
+
931
+ Data source: `executive_details.officers` from award detail.
932
+
933
+ * `recipient_top_officer_compensation`
934
+ * (See Organization Properties above — atom is dual-homed on the
935
+ `person` sub-record and the `organization` record.)
936
+
937
+ ---
938
+
939
+ ## Entity Relationships Summary
940
+
941
+ ```
942
+ usaspending::contract ──[awarded_to]──────────→ organization (recipient by UEI)
943
+ usaspending::contract ──[awarded_by]──────────→ organization (awarding subtier agency)
944
+ usaspending::contract ──[funded_by]───────────→ organization (funding subtier agency)
945
+ usaspending::contract ──[performed_at]────────→ location (place of performance)
946
+ usaspending::contract ──[in_industry]─────────→ industry (NAICS)
947
+ usaspending::contract ──[procured_product]────→ product_service(PSC)
948
+ usaspending::contract ──[child_of]────────────→ usaspending::idv (delivery/task orders)
949
+
950
+ usaspending::idv ──[awarded_to]──────────→ organization
951
+ usaspending::idv ──[awarded_by]──────────→ organization
952
+ usaspending::idv ──[funded_by]───────────→ organization
953
+ usaspending::idv ──[performed_at]────────→ location
954
+ usaspending::idv ──[in_industry]─────────→ industry
955
+ usaspending::idv ──[procured_product]────→ product_service
956
+
957
+ usaspending::grant ──[awarded_to]──────────→ organization (recipient by UEI)
958
+ usaspending::grant ──[awarded_by]──────────→ organization (awarding subtier agency)
959
+ usaspending::grant ──[funded_by]───────────→ organization (funding subtier agency)
960
+ usaspending::grant ──[performed_at]────────→ location
961
+ usaspending::grant ──[funded_program]──────→ federal_program(assistance listing)
962
+
963
+ usaspending::transaction ──[is_modification_of]→ usaspending::contract | ::idv | ::grant
964
+
965
+ usaspending::subaward ──[awarded_to]──────────→ organization (sub-recipient by UEI from bulk CSV)
966
+ usaspending::subaward ──[under_prime]─────────→ usaspending::contract | ::grant
967
+ usaspending::subaward ──[subcontracted_from]──→ organization (prime recipient, redundant edge for graph traversal)
968
+
969
+ organization (subtier agency) ──[child_of]────→ organization (toptier agency)
970
+ organization (recipient) ──[is_subsidiary_of]→ organization (parent recipient by UEI)
971
+ organization ──[is_located_at]→ location
972
+ person ──[employed_by]──→ organization (FFATA top-5 officers)
973
+ ```
974
+
975
+ **Note on agency layering:** `awarded_by` and `funded_by` point at the
976
+ **subtier** agency directly. The corresponding **toptier** agency is
977
+ reachable via one hop along the subtier's `child_of` edge. This avoids
978
+ duplicating awarded-by edges at both layers while preserving full
979
+ traceability.
980
+
981
+ ---
982
+
983
+ ## Citations
984
+
985
+ Primary citation for each award atom is a public URL on
986
+ USAspending.gov:
987
+
988
+ - Award canonical URL:
989
+ `https://www.usaspending.gov/award/{generated_unique_award_id}/`
990
+ (the integer `usaspending_internal_id` is an equally valid path:
991
+ `/award/{usaspending_internal_id}/`, used as a fallback when the
992
+ string ID has been rewritten).
993
+ - Recipient profile URL:
994
+ `https://www.usaspending.gov/recipient/{recipient_hash}/all`
995
+ (the `-C` / `-P` / `-R` suffix on `recipient_hash` denotes Child /
996
+ Parent / general Recipient levels respectively).
997
+ - Agency profile URL:
998
+ `https://www.usaspending.gov/agency/{agency_slug}`.
999
+
1000
+ Transaction atoms cite the same URL as their parent award. Subaward
1001
+ atoms cite the parent award URL plus the subaward internal id.
1002
+
1003
+ ---
1004
+
1005
+ ## Cadence, Backfill, and Volume Notes
1006
+
1007
+ ### Cadence
1008
+
1009
+ | Phase | Frequency | Mechanism |
1010
+ |-------|-----------|-----------|
1011
+ | Initial backfill — prime awards (FY2008 → onboarding date) | Once on cold start | Pre-built Bulk Award CSV archives from `files.usaspending.gov/award_data_archive/`. **The prime Award archives are not monolithic**: USAspending publishes per-Agency × per-Fiscal-Year ZIPs under the "Award Data Archive" path (enumerated via `/api/v2/bulk_download/list_monthly_files/`), so the streamer iterates the (toptier-agency × FY2008..present) matrix (~hundreds of ZIPs) to cover the full historical seed. |
1012
+ | Initial backfill — subawards (FY2008 → onboarding date) | Once on cold start | **Today:** `/api/v2/search/spending_by_award/?subawards=true`, partitioned (FY × group × calendar-month) to stay under the API's 100k-row hard cap. **Planned (A9.5d):** asynchronous bulk-download jobs against `/api/v2/bulk_download/awards/` with `sub_award_types`, which return `All_Contracts_Subawards_*.csv` / `All_Assistance_Subawards_*.csv` ZIPs with no row cap and full `subawardee_uei`/`subawardee_duns` fidelity. There is no static pre-built subaward archive at `files.usaspending.gov` — bulk subaward CSVs are job-generated on demand. |
1013
+ | Daily incremental — prime awards | Daily | API `spending_by_award` filtered on `last_modified_date >= now - 30d`, then per-award detail and transactions enumeration |
1014
+ | Daily subaward delta | Daily | **Today:** same search API path (no UEI fidelity in list response). **Planned (A9.5d):** daily bulk-download job restricted to the lookback window. |
1015
+ | Reference catalogs (toptier agencies, NAICS, PSC, assistance listings) | Weekly | API reference endpoints |
1016
+
1017
+ USAspending refreshes nightly from FPDS (next-day) and FABS (weekly to
1018
+ bi-weekly). The 30-day lookback window on the daily delta absorbs late
1019
+ corrections and FABS lag.
1020
+
1021
+ ### Hard Limits
1022
+
1023
+ - **API rate limit:** ~1,000 requests per 5 minutes per IP address
1024
+ (~200 req/min sustained).
1025
+ - **Search pagination ceiling:** 10,000 records per single query on
1026
+ `spending_by_award` and related search endpoints. The daily delta
1027
+ enumeration must slice queries by `last_modified_date` (and, if
1028
+ needed, agency or award category) to stay under this wall.
1029
+ - **Earliest search date:** API queries are limited to `2007-10-01`
1030
+ and later (FY2008). Earlier data (back to FY2001) is only available
1031
+ via bulk download.
1032
+
1033
+ ### Volume Estimates (v0 scope, FY2008 → present)
1034
+
1035
+ - Contracts: ~30M prime award records, ~120M transactions.
1036
+ - IDVs: ~200K prime award records.
1037
+ - Grants: ~3M prime award records, ~15M transactions.
1038
+ - Subawards: ~5M total under in-scope primes.
1039
+
1040
+ These volumes confirm that bulk-CSV backfill is required: even at the
1041
+ maximum sustained API throughput (~12,000 req/hr), per-record API
1042
+ backfill of 170M records would take ~590 contiguous days. The bulk
1043
+ archives provide the same data in ZIPped CSV form, downloadable in
1044
+ hours rather than years.
1045
+
1046
+ ---
1047
+
1048
+ ## Acknowledged Minor Gaps and Deferred Items
1049
+
1050
+ These items were flagged in the Deep Research dictionary review
1051
+ (round 1) and either documented as caveats above or explicitly
1052
+ deferred:
1053
+
1054
+ - **Account-level financials** (`total_account_obligation`,
1055
+ `total_account_outlay`, per-DEFC breakdowns): deferred to the
1056
+ Federal Account / TAS future-scope item. Higher-resolution auditing
1057
+ is not required for v0 KG querying.
1058
+ - **Funding Opportunity (NOFO) entity:** kept as a string property
1059
+ on grants (`funding_opportunity_number`) rather than promoting to a
1060
+ standalone entity. NOFO data is sparse and frequently
1061
+ `"NOT APPLICABLE"`; promotion can revisit if downstream consumers
1062
+ need cross-grant correlation by NOFO.
1063
+ - **`generated_unique_award_id` aliasing:** documented as a caveat on
1064
+ the property itself; resolver carries both the string ID and
1065
+ `usaspending_internal_id` as strong IDs so historical rewrites do
1066
+ not split entities.
1067
+ - **Zero-dollar IDVs and negative-obligation transactions:** documented
1068
+ inline in the relevant property descriptions.
1069
+ - **`parent_recipient_uei` lag:** documented as a caveat; treated as
1070
+ point-in-time rather than current truth.
1071
+ - **Indirect Cost Rates (OMB M-21-03):** captured in the future-scope
1072
+ list; USAspending has not yet surfaced these in API responses.
1073
+ - **Cross-source resolution to EDGAR (CIK), GLEIF (LEI):** captured in
1074
+ the future-scope list; v0 publishes UEI as the strong ID so any
1075
+ future crosswalk has a clean merge target.