@yottagraph-app/data-model-skill 0.0.35 → 0.0.37
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skill/blscew/DATA_DICTIONARY.md +389 -0
- package/skill/blscew/schema.yaml +345 -0
- package/skill/fips/DATA_DICTIONARY.md +135 -0
- package/skill/fips/schema.yaml +77 -0
package/package.json
CHANGED
|
@@ -0,0 +1,389 @@
|
|
|
1
|
+
# Data Dictionary: BLS CEW (QCEW)
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
This dictionary documents the entity types, properties, and attributes
|
|
6
|
+
that the BLS Quarterly Census of Employment and Wages (QCEW) source
|
|
7
|
+
contributes to the Lovelace knowledge graph. It is the contract between
|
|
8
|
+
the source and downstream consumers (ingest, query server, UI).
|
|
9
|
+
|
|
10
|
+
QCEW is a quarterly count of employment and wages reported by employers,
|
|
11
|
+
covering more than 95 % of US jobs. It is published by the U.S. Bureau
|
|
12
|
+
of Labor Statistics for every state, MSA, and county, broken out by
|
|
13
|
+
ownership (federal/state/local/private) and by industry (NAICS), at the
|
|
14
|
+
sector / 3-digit / 4-digit / 5-digit / 6-digit detail levels. The
|
|
15
|
+
Lovelace QCEW source ingests the published quarterly data slices —
|
|
16
|
+
specifically the by-area CSV slices for the United States as a whole
|
|
17
|
+
plus all 50 states and DC — and emits one record per area × ownership ×
|
|
18
|
+
industry × quarter combination.
|
|
19
|
+
|
|
20
|
+
**Pipeline:** Download → Extract → Atomize.
|
|
21
|
+
- Download fetches the per-area CSV slice for each (area, year, quarter)
|
|
22
|
+
from `https://data.bls.gov/cew/data/api/{year}/{quarter}/area/{area_fips}.csv`.
|
|
23
|
+
- Extract is a pass-through (the raw CSV is the structured input).
|
|
24
|
+
- Atomize parses each CSV row into KG records.
|
|
25
|
+
|
|
26
|
+
**Cadence:** BLS publishes each quarter's QCEW release roughly 5–7
|
|
27
|
+
months after the close of the quarter (Q1 in late Aug/Sep, Q2 in late
|
|
28
|
+
Nov/Dec, Q3 in early Mar of the next year, Q4 in early Jun). The
|
|
29
|
+
streamer polls weekly; new quarters are detected and atomized when they
|
|
30
|
+
appear.
|
|
31
|
+
|
|
32
|
+
**Disclosure suppression.** Rows with `disclosure_code == "N"` are
|
|
33
|
+
withheld by BLS to protect employer confidentiality (typically when an
|
|
34
|
+
industry has very few establishments in an area). All numeric values on
|
|
35
|
+
those rows are zero. The atomizer drops "N"-disclosed rows entirely so
|
|
36
|
+
the KG never contains zero-valued QCEW observations that would be
|
|
37
|
+
mistaken for real data.
|
|
38
|
+
|
|
39
|
+
**Series identity.** A QCEW "series" is uniquely identified by the
|
|
40
|
+
4-tuple (area_fips, own_code, industry_code, agglvl_code). The atomizer
|
|
41
|
+
constructs a synthetic series id of the form
|
|
42
|
+
`{area_fips}.{own_code}.{industry_code}.{agglvl_code}` and emits this
|
|
43
|
+
as the `cew_series_id` strong id on a `cew_series` entity. `size_code`
|
|
44
|
+
is always `0` for the quarterly area slices ingested today (size-class
|
|
45
|
+
slices are an annual-only artifact and are out of scope for v1).
|
|
46
|
+
|
|
47
|
+
**Source name:** `blscew-source`
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Entity Types
|
|
52
|
+
|
|
53
|
+
### `cew_series`
|
|
54
|
+
|
|
55
|
+
A single QCEW time series — the unique combination of geographic area,
|
|
56
|
+
ownership, industry, and aggregation level — that carries quarterly
|
|
57
|
+
employment, wages, and establishment metrics over time.
|
|
58
|
+
|
|
59
|
+
- Primary key: `cew_series_id` (synthetic id, see above) used as the
|
|
60
|
+
strong id for resolution.
|
|
61
|
+
- Entity resolver: named entity, NOT_MERGEABLE. The strong id is
|
|
62
|
+
`cew_series_id`. The disambiguation snippet includes area title,
|
|
63
|
+
ownership title, industry title, and aggregation level title.
|
|
64
|
+
- Source: `blscew-source`
|
|
65
|
+
- Examples produced: `US000.0.10.10` (US national, total ownership, all
|
|
66
|
+
industries), `06000.5.31-33.55` (California, private ownership,
|
|
67
|
+
Manufacturing supersector), `06037.0.5111.74` (Los Angeles County,
|
|
68
|
+
total ownership, NAICS 5111 Newspaper / Periodical Publishers).
|
|
69
|
+
|
|
70
|
+
### `location`
|
|
71
|
+
|
|
72
|
+
A US geographic area (national, state, MSA, or county) for which BLS
|
|
73
|
+
publishes QCEW data, identified by its FIPS-based `area_fips` code.
|
|
74
|
+
|
|
75
|
+
- Primary key: `area_fips` strong id.
|
|
76
|
+
- Entity resolver: named entity, NOT_MERGEABLE. Strong id is
|
|
77
|
+
`area_fips`. Snippet includes the area title and the area type
|
|
78
|
+
(national / statewide / MSA / county).
|
|
79
|
+
- Source: `blscew-source`
|
|
80
|
+
- Examples produced: `US000` (U.S. TOTAL), `06000` (California -- Statewide),
|
|
81
|
+
`06037` (Los Angeles County, California), `C1018` (Albany-Schenectady-Troy,
|
|
82
|
+
NY MSA).
|
|
83
|
+
|
|
84
|
+
### `industry`
|
|
85
|
+
|
|
86
|
+
An economic activity category from the North American Industry
|
|
87
|
+
Classification System (NAICS), or a BLS-defined supersector that rolls
|
|
88
|
+
up multiple NAICS sectors.
|
|
89
|
+
|
|
90
|
+
- Primary key: `naics_code` strong id (the BLS QCEW `industry_code`,
|
|
91
|
+
which can be a NAICS sector / 3-digit / 4-digit / 5-digit / 6-digit
|
|
92
|
+
code, a 2-digit BLS supersector aggregate (e.g. `31-33` for
|
|
93
|
+
Manufacturing), or the special aggregate `10` meaning "Total, all
|
|
94
|
+
industries").
|
|
95
|
+
- Entity resolver: named entity, NOT_MERGEABLE. Strong id is
|
|
96
|
+
`naics_code`. Snippet includes the industry title.
|
|
97
|
+
- Source: `blscew-source`
|
|
98
|
+
- Examples produced: `10` (Total, all industries), `31-33`
|
|
99
|
+
(Manufacturing), `5111` (Newspaper, Periodical, Book, and Directory
|
|
100
|
+
Publishers), `541` (Professional, Scientific, and Technical Services).
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## Properties
|
|
105
|
+
|
|
106
|
+
### Identity & Metadata Properties (cew_series)
|
|
107
|
+
|
|
108
|
+
These atoms appear once per series, timestamped at the first quarter
|
|
109
|
+
the series is observed in the current run.
|
|
110
|
+
|
|
111
|
+
* `cew_series_id`
|
|
112
|
+
* Definition: synthetic identifier for a QCEW series, built by
|
|
113
|
+
joining the area, ownership, industry, and aggregation-level codes
|
|
114
|
+
with dots.
|
|
115
|
+
* Examples: `"US000.0.10.10"`, `"06000.5.31-33.55"`,
|
|
116
|
+
`"06037.0.5111.74"`
|
|
117
|
+
* Derivation: built by the atomizer from the QCEW `area_fips`,
|
|
118
|
+
`own_code`, `industry_code`, and `agglvl_code` fields of each CSV
|
|
119
|
+
row.
|
|
120
|
+
|
|
121
|
+
* `name`
|
|
122
|
+
* Definition: human-readable label for the series, combining area,
|
|
123
|
+
ownership, and industry titles.
|
|
124
|
+
* Examples: `"U.S. TOTAL · Total Covered · Total, all industries"`,
|
|
125
|
+
`"California · Private · Manufacturing"`
|
|
126
|
+
* Derivation: built from the `area_title`, `own_title`, and
|
|
127
|
+
`industry_title` fields on the CSV row.
|
|
128
|
+
|
|
129
|
+
* `area_fips`
|
|
130
|
+
* Definition: BLS-assigned 5-character area identifier (FIPS-based).
|
|
131
|
+
Acts as strong id on the `location` entity that the series points
|
|
132
|
+
to.
|
|
133
|
+
* Examples: `"US000"`, `"06000"`, `"06037"`, `"C1018"`
|
|
134
|
+
* Derivation: `area_fips` field of the CSV row.
|
|
135
|
+
|
|
136
|
+
* `area_title`
|
|
137
|
+
* Definition: human-readable title for the geographic area.
|
|
138
|
+
* Examples: `"U.S. TOTAL"`, `"California -- Statewide"`,
|
|
139
|
+
`"Los Angeles County, California"`,
|
|
140
|
+
`"Albany-Schenectady-Troy, NY MSA"`
|
|
141
|
+
* Derivation: `area_title` field of the CSV row.
|
|
142
|
+
|
|
143
|
+
* `ownership_code`
|
|
144
|
+
* Definition: BLS one-character code identifying the ownership
|
|
145
|
+
sector covered by the series.
|
|
146
|
+
* Examples: `"0"` (Total Covered), `"1"` (Federal Government),
|
|
147
|
+
`"2"` (State Government), `"3"` (Local Government),
|
|
148
|
+
`"5"` (Private)
|
|
149
|
+
* Derivation: `own_code` field of the CSV row.
|
|
150
|
+
|
|
151
|
+
* `ownership_title`
|
|
152
|
+
* Definition: human-readable label for the ownership sector.
|
|
153
|
+
* Examples: `"Total Covered"`, `"Federal Government"`,
|
|
154
|
+
`"State Government"`, `"Local Government"`, `"Private"`
|
|
155
|
+
* Derivation: `own_title` field of the CSV row.
|
|
156
|
+
|
|
157
|
+
* `naics_code`
|
|
158
|
+
* Definition: industry code used by BLS — NAICS at varying levels of
|
|
159
|
+
aggregation, plus BLS supersector aggregates and the special
|
|
160
|
+
"Total, all industries" code `10`. Acts as strong id on the
|
|
161
|
+
`industry` entity that the series points to.
|
|
162
|
+
* Examples: `"10"`, `"31-33"`, `"5111"`, `"541211"`
|
|
163
|
+
* Derivation: `industry_code` field of the CSV row.
|
|
164
|
+
|
|
165
|
+
* `naics_description`
|
|
166
|
+
* Definition: human-readable name of the industry / supersector.
|
|
167
|
+
* Examples: `"Total, all industries"`, `"Manufacturing"`,
|
|
168
|
+
`"Newspaper, periodical, book and directory publishers"`
|
|
169
|
+
* Derivation: `industry_title` field of the CSV row.
|
|
170
|
+
|
|
171
|
+
* `aggregation_level_code`
|
|
172
|
+
* Definition: BLS two-character code describing the geographic and
|
|
173
|
+
industry aggregation level the series represents (e.g. national
|
|
174
|
+
total, statewide ownership × supersector, county × 6-digit NAICS).
|
|
175
|
+
* Examples: `"10"` (national, by ownership × total), `"55"`
|
|
176
|
+
(statewide, by ownership × supersector),
|
|
177
|
+
`"74"` (county, by ownership × 5-digit NAICS),
|
|
178
|
+
`"78"` (county, by ownership × 6-digit NAICS)
|
|
179
|
+
* Derivation: `agglvl_code` field of the CSV row.
|
|
180
|
+
|
|
181
|
+
* `aggregation_level_title`
|
|
182
|
+
* Definition: human-readable description of the aggregation level.
|
|
183
|
+
* Examples: `"National, by ownership sector"`,
|
|
184
|
+
`"Statewide, by ownership sector and supersector"`,
|
|
185
|
+
`"County, by ownership sector and 6-digit NAICS"`
|
|
186
|
+
* Derivation: `agglvl_title` field of the CSV row.
|
|
187
|
+
|
|
188
|
+
* `area_type`
|
|
189
|
+
* Definition: classification of the area as one of `national`,
|
|
190
|
+
`statewide`, `msa`, or `county` derived from the leading characters
|
|
191
|
+
of `area_fips` and the trailing zeros pattern.
|
|
192
|
+
* Examples: `"national"`, `"statewide"`, `"county"`, `"msa"`
|
|
193
|
+
* Derivation: heuristic on `area_fips`: `US000` → national; FIPS
|
|
194
|
+
starting with `C` → MSA; 5-digit ending in `000` → statewide; any
|
|
195
|
+
other 5-digit code → county.
|
|
196
|
+
|
|
197
|
+
* `naics_level`
|
|
198
|
+
* Definition: granularity of the NAICS code on this row, where
|
|
199
|
+
higher values are more specific. Stored as a float to keep with
|
|
200
|
+
the schema's float-for-numeric convention.
|
|
201
|
+
* Examples: `2.0` (sector or supersector), `3.0` (subsector),
|
|
202
|
+
`4.0` (industry group), `5.0` (NAICS industry), `6.0` (national
|
|
203
|
+
industry).
|
|
204
|
+
* Derivation: length of the `industry_code` string, with the special
|
|
205
|
+
case `10` (Total) and any 2-character supersector code mapped to
|
|
206
|
+
`2.0`.
|
|
207
|
+
|
|
208
|
+
* `publisher`
|
|
209
|
+
* Definition: organization that publishes the data (always BLS).
|
|
210
|
+
* Examples: `"U.S. Bureau of Labor Statistics"`
|
|
211
|
+
* Derivation: hard-coded constant; QCEW is exclusively a BLS product.
|
|
212
|
+
|
|
213
|
+
### Quarterly Observation Properties (cew_series)
|
|
214
|
+
|
|
215
|
+
These atoms appear on the per-quarter observation records, timestamped
|
|
216
|
+
at the last day of the quarter (e.g. 2024-03-31 for Q1 2024).
|
|
217
|
+
|
|
218
|
+
* `establishment_count`
|
|
219
|
+
* Definition: count of establishments (physical locations of
|
|
220
|
+
employers) covered by the series in the quarter.
|
|
221
|
+
* Examples: `11907855` (US total Q1 2024), `61375` (US Federal
|
|
222
|
+
Government total Q1 2024)
|
|
223
|
+
* Derivation: `qtrly_estabs` field of the CSV row.
|
|
224
|
+
|
|
225
|
+
* `monthly_employment_m1`, `monthly_employment_m2`, `monthly_employment_m3`
|
|
226
|
+
* Definition: number of employees on payrolls covered in the first /
|
|
227
|
+
second / third month of the quarter (BLS uses the pay period
|
|
228
|
+
including the 12th of the month).
|
|
229
|
+
* Examples (US Q1 2024 total): `152393725`, `153129544`, `153848430`
|
|
230
|
+
* Derivation: `month1_emplvl`, `month2_emplvl`, `month3_emplvl`
|
|
231
|
+
fields of the CSV row.
|
|
232
|
+
|
|
233
|
+
* `employment_level`
|
|
234
|
+
* Definition: representative quarterly employment level for the
|
|
235
|
+
series, taken as the third-month employment (final month of the
|
|
236
|
+
quarter), aligned to the way QCEW publications report "QCEW
|
|
237
|
+
employment".
|
|
238
|
+
* Examples: `153848430` (US Q1 2024 total)
|
|
239
|
+
* Derivation: `month3_emplvl` field of the CSV row.
|
|
240
|
+
|
|
241
|
+
* `total_quarterly_wages`
|
|
242
|
+
* Definition: total wages paid (in current US dollars) to all covered
|
|
243
|
+
workers during the quarter.
|
|
244
|
+
* Examples: `3037790324790` (US Q1 2024 total = $3.04 trillion)
|
|
245
|
+
* Derivation: `total_qtrly_wages` field of the CSV row.
|
|
246
|
+
|
|
247
|
+
* `taxable_quarterly_wages`
|
|
248
|
+
* Definition: portion of total quarterly wages subject to UI tax
|
|
249
|
+
contributions (in current US dollars). Always 0 for federal
|
|
250
|
+
government employment, which is not subject to UI tax.
|
|
251
|
+
* Examples: `1151875077520` (US Q1 2024 total)
|
|
252
|
+
* Derivation: `taxable_qtrly_wages` field of the CSV row.
|
|
253
|
+
|
|
254
|
+
* `quarterly_contributions`
|
|
255
|
+
* Definition: total UI tax contributions (in current US dollars)
|
|
256
|
+
associated with this employment in the quarter.
|
|
257
|
+
* Examples: `19555346530` (US Q1 2024 total)
|
|
258
|
+
* Derivation: `qtrly_contributions` field of the CSV row.
|
|
259
|
+
|
|
260
|
+
* `avg_weekly_wage`
|
|
261
|
+
* Definition: average weekly wage (in current US dollars) per
|
|
262
|
+
employee covered in the quarter, computed by BLS as
|
|
263
|
+
`total_qtrly_wages / (avg_emplvl × 13)`.
|
|
264
|
+
* Examples: `1526` (US Q1 2024 total = $1,526/week)
|
|
265
|
+
* Derivation: `avg_wkly_wage` field of the CSV row.
|
|
266
|
+
|
|
267
|
+
### Year-over-Year Change Properties (cew_series)
|
|
268
|
+
|
|
269
|
+
BLS pre-computes over-the-year (OTY) absolute and percent changes for
|
|
270
|
+
each quarterly metric, comparing the current quarter to the same quarter
|
|
271
|
+
of the prior year. The atomizer emits the percent-change variants as
|
|
272
|
+
their own atoms so downstream consumers can directly query
|
|
273
|
+
"Employment growth YoY" without recomputing it from observation history.
|
|
274
|
+
|
|
275
|
+
* `employment_yoy_pct_chg`
|
|
276
|
+
* Definition: year-over-year percent change in the canonical
|
|
277
|
+
`employment_level` (= third-month / end-of-quarter snapshot).
|
|
278
|
+
* Examples: `1.5` (US total Q1 2024 vs Q1 2023 = +1.5 %)
|
|
279
|
+
* Derivation: `oty_month3_emplvl_pct_chg` field of the CSV row. We
|
|
280
|
+
intentionally do *not* also emit `monthly_employment_m3_yoy_pct_chg`
|
|
281
|
+
-- it would duplicate this number under a redundant property name.
|
|
282
|
+
|
|
283
|
+
* `monthly_employment_m1_yoy_pct_chg`, `monthly_employment_m2_yoy_pct_chg`
|
|
284
|
+
* Definition: year-over-year percent change in mid-quarter monthly
|
|
285
|
+
employment (the M1 and M2 snapshots that have no canonical alias;
|
|
286
|
+
the M3 snapshot is exposed as `employment_yoy_pct_chg` above).
|
|
287
|
+
* Examples (US Q1 2024 total): `1.4`, `1.4`
|
|
288
|
+
* Derivation: `oty_month1_emplvl_pct_chg`, `oty_month2_emplvl_pct_chg`,
|
|
289
|
+
`oty_month3_emplvl_pct_chg` fields of the CSV row.
|
|
290
|
+
|
|
291
|
+
* `establishments_yoy_pct_chg`
|
|
292
|
+
* Definition: year-over-year percent change in the count of
|
|
293
|
+
establishments.
|
|
294
|
+
* Examples: `1.2` (US Q1 2024 total = +1.2 %)
|
|
295
|
+
* Derivation: `oty_qtrly_estabs_pct_chg` field of the CSV row.
|
|
296
|
+
|
|
297
|
+
* `total_quarterly_wages_yoy_pct_chg`
|
|
298
|
+
* Definition: year-over-year percent change in total quarterly
|
|
299
|
+
wages.
|
|
300
|
+
* Examples: `5.7` (US Q1 2024 total = +5.7 %)
|
|
301
|
+
* Derivation: `oty_total_qtrly_wages_pct_chg` field of the CSV row.
|
|
302
|
+
|
|
303
|
+
* `avg_weekly_wage_yoy_pct_chg`
|
|
304
|
+
* Definition: year-over-year percent change in the average weekly
|
|
305
|
+
wage.
|
|
306
|
+
* Examples: `4.2` (US Q1 2024 total = +4.2 %)
|
|
307
|
+
* Derivation: `oty_avg_wkly_wage_pct_chg` field of the CSV row.
|
|
308
|
+
|
|
309
|
+
* `taxable_quarterly_wages_yoy_pct_chg`
|
|
310
|
+
* Definition: year-over-year percent change in taxable quarterly
|
|
311
|
+
wages.
|
|
312
|
+
* Examples: `3.3`
|
|
313
|
+
* Derivation: `oty_taxable_qtrly_wages_pct_chg` field of the CSV row.
|
|
314
|
+
|
|
315
|
+
* `quarterly_contributions_yoy_pct_chg`
|
|
316
|
+
* Definition: year-over-year percent change in UI contributions.
|
|
317
|
+
* Examples: `4.3`
|
|
318
|
+
* Derivation: `oty_qtrly_contributions_pct_chg` field of the CSV row.
|
|
319
|
+
|
|
320
|
+
### Location Properties (location)
|
|
321
|
+
|
|
322
|
+
* `area_fips`
|
|
323
|
+
* Definition: BLS-assigned area identifier (FIPS-based) used as the
|
|
324
|
+
location's strong id.
|
|
325
|
+
* Examples: `"US000"`, `"06000"`, `"06037"`, `"C1018"`
|
|
326
|
+
* Derivation: `area_fips` field of the CSV row.
|
|
327
|
+
|
|
328
|
+
* `name`
|
|
329
|
+
* Definition: human-readable name of the geographic area.
|
|
330
|
+
* Examples: `"U.S. TOTAL"`, `"California -- Statewide"`,
|
|
331
|
+
`"Los Angeles County, California"`
|
|
332
|
+
* Derivation: `area_title` field of the CSV row.
|
|
333
|
+
|
|
334
|
+
* `area_type`
|
|
335
|
+
* Definition: classification of the area as `national`, `statewide`,
|
|
336
|
+
`msa`, or `county`.
|
|
337
|
+
* Examples: `"national"`, `"statewide"`, `"county"`, `"msa"`
|
|
338
|
+
* Derivation: heuristic on `area_fips`; see the `cew_series` entry of
|
|
339
|
+
the same name above.
|
|
340
|
+
|
|
341
|
+
### Industry Properties (industry)
|
|
342
|
+
|
|
343
|
+
* `naics_code`
|
|
344
|
+
* Definition: industry code used by BLS (NAICS at sector / 3-digit /
|
|
345
|
+
4-digit / 5-digit / 6-digit detail, plus BLS supersector aggregates
|
|
346
|
+
and `10` = Total, all industries). Strong id for `industry`.
|
|
347
|
+
* Examples: `"10"`, `"31-33"`, `"5111"`, `"541211"`
|
|
348
|
+
* Derivation: `industry_code` field of the CSV row.
|
|
349
|
+
|
|
350
|
+
* `naics_description`
|
|
351
|
+
* Definition: human-readable industry name.
|
|
352
|
+
* Examples: `"Total, all industries"`, `"Manufacturing"`,
|
|
353
|
+
`"Newspaper, periodical, book and directory publishers"`
|
|
354
|
+
* Derivation: `industry_title` field of the CSV row.
|
|
355
|
+
|
|
356
|
+
* `naics_level`
|
|
357
|
+
* Definition: granularity of the NAICS code (see the `cew_series`
|
|
358
|
+
entry of the same name above for derivation).
|
|
359
|
+
* Examples: `2.0`, `3.0`, `4.0`, `5.0`, `6.0`
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
## Entity Relationships Summary
|
|
364
|
+
|
|
365
|
+
The QCEW source emits two relationship types — both pointing from the
|
|
366
|
+
real-world contextual entity (a US area or an industry) to the
|
|
367
|
+
`cew_series` it appears in. This mirrors the FRED source's
|
|
368
|
+
`appears_in_fred_series` pattern.
|
|
369
|
+
|
|
370
|
+
```
|
|
371
|
+
location ──[appears_in_cew_series]──→ cew_series
|
|
372
|
+
industry ──[appears_in_cew_series]──→ cew_series
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
* `appears_in_cew_series`
|
|
376
|
+
* Definition: the subject (a US geographic area or an industry)
|
|
377
|
+
appears as the area / industry dimension of a QCEW time series.
|
|
378
|
+
* Domain flavors: `location`, `industry`
|
|
379
|
+
* Target flavor: `cew_series`
|
|
380
|
+
* Derivation: emitted once per series per quarter on the location and
|
|
381
|
+
industry context records.
|
|
382
|
+
|
|
383
|
+
---
|
|
384
|
+
|
|
385
|
+
## Attributes
|
|
386
|
+
|
|
387
|
+
None. All quarterly metrics are timestamped scalar atoms; there are no
|
|
388
|
+
per-atom attributes (unit / frequency / etc.) on this source — those are
|
|
389
|
+
carried as atoms on the `cew_series` metadata record.
|
|
@@ -0,0 +1,345 @@
|
|
|
1
|
+
# Dataset schema for BLS QCEW (Quarterly Census of Employment and Wages).
|
|
2
|
+
#
|
|
3
|
+
# Architecture mirrors FRED:
|
|
4
|
+
# For each unique QCEW series — the (area_fips, own_code, industry_code,
|
|
5
|
+
# agglvl_code) tuple — the atomizer emits:
|
|
6
|
+
# 1. A metadata record: subject is the cew_series entity, atoms are
|
|
7
|
+
# identifying metadata (cew_series_id, area_fips, area_title,
|
|
8
|
+
# ownership_code/title, naics_code/description, aggregation level,
|
|
9
|
+
# publisher, etc.). Timestamped at the quarter end date.
|
|
10
|
+
# 2. One quarterly observation record per (series, quarter): subject is
|
|
11
|
+
# the cew_series, atoms are all quarterly metrics — establishment
|
|
12
|
+
# count, employment levels, wages, year-over-year percent changes —
|
|
13
|
+
# all timestamped at the quarter end date.
|
|
14
|
+
# 3. A location context record: subject is the geographic area
|
|
15
|
+
# (location flavor), single appears_in_cew_series atom pointing at
|
|
16
|
+
# the cew_series.
|
|
17
|
+
# 4. An industry context record: subject is the NAICS industry
|
|
18
|
+
# (industry flavor), single appears_in_cew_series atom pointing at
|
|
19
|
+
# the cew_series.
|
|
20
|
+
#
|
|
21
|
+
# All elements are passive — created by the atomizer from QCEW CSV data,
|
|
22
|
+
# not by LLM extraction.
|
|
23
|
+
#
|
|
24
|
+
# Source identifier used on all records: "blscew-source"
|
|
25
|
+
name: "blscew"
|
|
26
|
+
description: "Quarterly Census of Employment and Wages from the U.S. Bureau of Labor Statistics — quarterly establishment counts, employment levels, total and taxable wages, average weekly wages, and BLS-computed year-over-year changes by US area, ownership sector, and NAICS industry"
|
|
27
|
+
|
|
28
|
+
extraction:
|
|
29
|
+
flavors: closed
|
|
30
|
+
properties: closed
|
|
31
|
+
relationships: closed
|
|
32
|
+
attributes: closed
|
|
33
|
+
events: closed
|
|
34
|
+
|
|
35
|
+
flavors:
|
|
36
|
+
- name: "cew_series"
|
|
37
|
+
description: "A QCEW (Quarterly Census of Employment and Wages) time series identified by the unique combination of area, ownership, industry, and aggregation level. The entity carries metadata atoms and per-quarter observations of employment, wages, and establishment counts."
|
|
38
|
+
display_name: "QCEW Series"
|
|
39
|
+
mergeability: not_mergeable
|
|
40
|
+
strong_id_properties: ["cew_series_id"]
|
|
41
|
+
passive: true
|
|
42
|
+
|
|
43
|
+
- name: "location"
|
|
44
|
+
description: "A specific named geographic location such as a city, country, region, or landmark"
|
|
45
|
+
display_name: "Location"
|
|
46
|
+
mergeability: not_mergeable
|
|
47
|
+
strong_id_properties: ["area_fips"]
|
|
48
|
+
passive: true
|
|
49
|
+
|
|
50
|
+
- name: "industry"
|
|
51
|
+
description: "An industry classification of economic activity (e.g. NAICS or SIC) identifying the line of business associated with an organization or award"
|
|
52
|
+
display_name: "Industry"
|
|
53
|
+
mergeability: not_mergeable
|
|
54
|
+
strong_id_properties: ["naics_code"]
|
|
55
|
+
passive: true
|
|
56
|
+
|
|
57
|
+
properties:
|
|
58
|
+
# --- Strong-id properties ---
|
|
59
|
+
|
|
60
|
+
- name: "cew_series_id"
|
|
61
|
+
type: string
|
|
62
|
+
description: "Synthetic identifier for a QCEW series formed by joining area_fips, own_code, industry_code, and agglvl_code with dots. Strong id for cew_series entities."
|
|
63
|
+
display_name: "QCEW Series ID"
|
|
64
|
+
mergeability: not_mergeable
|
|
65
|
+
domain_flavors: ["cew_series"]
|
|
66
|
+
examples: ["US000.0.10.10", "06000.5.31-33.55", "06037.0.5111.74"]
|
|
67
|
+
passive: true
|
|
68
|
+
|
|
69
|
+
- name: "area_fips"
|
|
70
|
+
type: string
|
|
71
|
+
description: "BLS-assigned 5-character QCEW area identifier (FIPS-based). For US states the value is the 2-digit FIPS state code followed by '000' (e.g. '06000' = California). For counties it is the 5-digit county FIPS (e.g. '06037' = Los Angeles County). For MSAs the BLS prefixes a 'C' (e.g. 'C1018' = Albany–Schenectady–Troy MSA). Strong id for the location entity that a cew_series is attached to."
|
|
72
|
+
display_name: "QCEW Area FIPS"
|
|
73
|
+
mergeability: not_mergeable
|
|
74
|
+
domain_flavors: ["cew_series", "location"]
|
|
75
|
+
examples: ["US000", "06000", "06037", "C1018"]
|
|
76
|
+
passive: true
|
|
77
|
+
|
|
78
|
+
- name: "naics_code"
|
|
79
|
+
type: string
|
|
80
|
+
description: "North American Industry Classification System code (typically 6 digits) identifying the industry of work performed under a contract (e.g., \"524114\" for Direct Health and Medical Insurance Carriers)"
|
|
81
|
+
display_name: "NAICS code"
|
|
82
|
+
mergeability: not_mergeable
|
|
83
|
+
domain_flavors: ["cew_series", "industry"]
|
|
84
|
+
examples: ["10", "31-33", "5111", "541211"]
|
|
85
|
+
passive: true
|
|
86
|
+
|
|
87
|
+
# --- Identity / metadata properties on cew_series ---
|
|
88
|
+
|
|
89
|
+
- name: "name"
|
|
90
|
+
type: string
|
|
91
|
+
description: "Display name of the entity"
|
|
92
|
+
display_name: "Name"
|
|
93
|
+
mergeability: not_mergeable
|
|
94
|
+
domain_flavors: ["cew_series", "location"]
|
|
95
|
+
examples: ["California -- Statewide", "U.S. TOTAL · Total Covered · Total, all industries"]
|
|
96
|
+
passive: true
|
|
97
|
+
|
|
98
|
+
- name: "area_title"
|
|
99
|
+
type: string
|
|
100
|
+
description: "Human-readable title for a QCEW geographic area (national, state, MSA, or county) as published by BLS."
|
|
101
|
+
display_name: "QCEW Area Title"
|
|
102
|
+
mergeability: not_mergeable
|
|
103
|
+
domain_flavors: ["cew_series"]
|
|
104
|
+
examples: ["U.S. TOTAL", "California -- Statewide", "Los Angeles County, California", "Albany-Schenectady-Troy, NY MSA"]
|
|
105
|
+
passive: true
|
|
106
|
+
|
|
107
|
+
- name: "ownership_code"
|
|
108
|
+
type: string
|
|
109
|
+
description: "BLS one-character QCEW ownership-sector code: 0 = Total Covered, 1 = Federal Government, 2 = State Government, 3 = Local Government, 5 = Private."
|
|
110
|
+
display_name: "QCEW Ownership Code"
|
|
111
|
+
mergeability: not_mergeable
|
|
112
|
+
domain_flavors: ["cew_series"]
|
|
113
|
+
examples: ["0", "1", "2", "3", "5"]
|
|
114
|
+
passive: true
|
|
115
|
+
|
|
116
|
+
- name: "ownership_title"
|
|
117
|
+
type: string
|
|
118
|
+
description: "Human-readable QCEW ownership-sector title (Total Covered, Federal Government, State Government, Local Government, Private)."
|
|
119
|
+
display_name: "QCEW Ownership Title"
|
|
120
|
+
mergeability: not_mergeable
|
|
121
|
+
domain_flavors: ["cew_series"]
|
|
122
|
+
examples: ["Total Covered", "Federal Government", "State Government", "Local Government", "Private"]
|
|
123
|
+
passive: true
|
|
124
|
+
|
|
125
|
+
- name: "naics_description"
|
|
126
|
+
type: string
|
|
127
|
+
description: "Human-readable name of the NAICS industry (e.g., \"DIRECT HEALTH AND MEDICAL INSURANCE CARRIERS\")"
|
|
128
|
+
display_name: "NAICS description"
|
|
129
|
+
mergeability: not_mergeable
|
|
130
|
+
domain_flavors: ["cew_series", "industry"]
|
|
131
|
+
examples: ["Total, all industries", "Manufacturing", "Newspaper, periodical, book and directory publishers"]
|
|
132
|
+
passive: true
|
|
133
|
+
|
|
134
|
+
- name: "aggregation_level_code"
|
|
135
|
+
type: string
|
|
136
|
+
description: "BLS QCEW two-character aggregation-level code describing the geographic and industry granularity of a series — for example national vs statewide vs county, and total vs supersector vs detailed NAICS."
|
|
137
|
+
display_name: "QCEW Aggregation Level Code"
|
|
138
|
+
mergeability: not_mergeable
|
|
139
|
+
domain_flavors: ["cew_series"]
|
|
140
|
+
examples: ["10", "55", "74", "78"]
|
|
141
|
+
passive: true
|
|
142
|
+
|
|
143
|
+
- name: "aggregation_level_title"
|
|
144
|
+
type: string
|
|
145
|
+
description: "Human-readable description of a QCEW aggregation level (e.g. \"National, by ownership sector\", \"County, by ownership sector and 6-digit NAICS\")."
|
|
146
|
+
display_name: "QCEW Aggregation Level Title"
|
|
147
|
+
mergeability: not_mergeable
|
|
148
|
+
domain_flavors: ["cew_series"]
|
|
149
|
+
examples: ["National, by ownership sector", "Statewide, by ownership sector and supersector", "County, by ownership sector and 6-digit NAICS"]
|
|
150
|
+
passive: true
|
|
151
|
+
|
|
152
|
+
- name: "area_type"
|
|
153
|
+
type: string
|
|
154
|
+
description: "Classification of a QCEW area as one of national, statewide, msa, or county, derived from the shape of the area_fips code."
|
|
155
|
+
display_name: "QCEW Area Type"
|
|
156
|
+
mergeability: not_mergeable
|
|
157
|
+
domain_flavors: ["cew_series", "location"]
|
|
158
|
+
examples: ["national", "statewide", "msa", "county"]
|
|
159
|
+
passive: true
|
|
160
|
+
|
|
161
|
+
- name: "naics_level"
|
|
162
|
+
type: float
|
|
163
|
+
description: "Granularity of the NAICS code on a QCEW series, expressed as the number of digits. 2.0 = sector or BLS supersector aggregate, 3.0 = subsector, 4.0 = industry group, 5.0 = NAICS industry, 6.0 = national industry."
|
|
164
|
+
display_name: "QCEW NAICS Level"
|
|
165
|
+
mergeability: not_mergeable
|
|
166
|
+
domain_flavors: ["cew_series", "industry"]
|
|
167
|
+
examples: [2.0, 3.0, 4.0, 5.0, 6.0]
|
|
168
|
+
passive: true
|
|
169
|
+
|
|
170
|
+
- name: "publisher"
|
|
171
|
+
type: string
|
|
172
|
+
description: "Name of the organization that publishes or maintains the data series, inferred from the release URL domain"
|
|
173
|
+
display_name: "Publisher"
|
|
174
|
+
mergeability: not_mergeable
|
|
175
|
+
domain_flavors: ["cew_series"]
|
|
176
|
+
examples: ["U.S. Bureau of Labor Statistics"]
|
|
177
|
+
passive: true
|
|
178
|
+
|
|
179
|
+
# --- Quarterly observation properties on cew_series ---
|
|
180
|
+
|
|
181
|
+
- name: "establishment_count"
|
|
182
|
+
type: float
|
|
183
|
+
description: "Count of QCEW-covered establishments (physical locations of employers) reported by BLS for the series during the quarter."
|
|
184
|
+
display_name: "QCEW Establishment Count"
|
|
185
|
+
mergeability: not_mergeable
|
|
186
|
+
domain_flavors: ["cew_series"]
|
|
187
|
+
examples: [11907855.0, 61375.0]
|
|
188
|
+
passive: true
|
|
189
|
+
|
|
190
|
+
- name: "monthly_employment_m1"
|
|
191
|
+
type: float
|
|
192
|
+
description: "Number of QCEW-covered employees on payrolls in the first month of the quarter (BLS uses the pay period including the 12th of the month)."
|
|
193
|
+
display_name: "QCEW Employment (Month 1)"
|
|
194
|
+
mergeability: not_mergeable
|
|
195
|
+
domain_flavors: ["cew_series"]
|
|
196
|
+
examples: [152393725.0]
|
|
197
|
+
passive: true
|
|
198
|
+
|
|
199
|
+
- name: "monthly_employment_m2"
|
|
200
|
+
type: float
|
|
201
|
+
description: "Number of QCEW-covered employees on payrolls in the second month of the quarter (pay period including the 12th)."
|
|
202
|
+
display_name: "QCEW Employment (Month 2)"
|
|
203
|
+
mergeability: not_mergeable
|
|
204
|
+
domain_flavors: ["cew_series"]
|
|
205
|
+
examples: [153129544.0]
|
|
206
|
+
passive: true
|
|
207
|
+
|
|
208
|
+
- name: "monthly_employment_m3"
|
|
209
|
+
type: float
|
|
210
|
+
description: "Number of QCEW-covered employees on payrolls in the third month of the quarter (pay period including the 12th)."
|
|
211
|
+
display_name: "QCEW Employment (Month 3)"
|
|
212
|
+
mergeability: not_mergeable
|
|
213
|
+
domain_flavors: ["cew_series"]
|
|
214
|
+
examples: [153848430.0]
|
|
215
|
+
passive: true
|
|
216
|
+
|
|
217
|
+
- name: "employment_level"
|
|
218
|
+
type: float
|
|
219
|
+
description: "Representative quarterly employment level for the series — taken as the third-month employment, matching the convention BLS uses when publishing single-number QCEW employment figures."
|
|
220
|
+
display_name: "QCEW Quarterly Employment Level"
|
|
221
|
+
mergeability: not_mergeable
|
|
222
|
+
domain_flavors: ["cew_series"]
|
|
223
|
+
examples: [153848430.0]
|
|
224
|
+
passive: true
|
|
225
|
+
|
|
226
|
+
- name: "total_quarterly_wages"
|
|
227
|
+
type: float
|
|
228
|
+
description: "Total wages paid (in current US dollars) to all QCEW-covered workers for the series during the quarter."
|
|
229
|
+
display_name: "QCEW Total Quarterly Wages (USD)"
|
|
230
|
+
mergeability: not_mergeable
|
|
231
|
+
domain_flavors: ["cew_series"]
|
|
232
|
+
examples: [3037790324790.0]
|
|
233
|
+
passive: true
|
|
234
|
+
|
|
235
|
+
- name: "taxable_quarterly_wages"
|
|
236
|
+
type: float
|
|
237
|
+
description: "Portion of total quarterly wages (in current US dollars) subject to UI tax contributions. Always 0 for federal-government employment, which is not subject to UI tax."
|
|
238
|
+
display_name: "QCEW Taxable Quarterly Wages (USD)"
|
|
239
|
+
mergeability: not_mergeable
|
|
240
|
+
domain_flavors: ["cew_series"]
|
|
241
|
+
examples: [1151875077520.0]
|
|
242
|
+
passive: true
|
|
243
|
+
|
|
244
|
+
- name: "quarterly_contributions"
|
|
245
|
+
type: float
|
|
246
|
+
description: "Total UI tax contributions (in current US dollars) associated with the series' employment during the quarter."
|
|
247
|
+
display_name: "QCEW Quarterly UI Contributions (USD)"
|
|
248
|
+
mergeability: not_mergeable
|
|
249
|
+
domain_flavors: ["cew_series"]
|
|
250
|
+
examples: [19555346530.0]
|
|
251
|
+
passive: true
|
|
252
|
+
|
|
253
|
+
- name: "avg_weekly_wage"
|
|
254
|
+
type: float
|
|
255
|
+
description: "Average weekly wage (in current US dollars) per QCEW-covered employee for the series during the quarter, computed by BLS as total_qtrly_wages / (avg_emplvl × 13)."
|
|
256
|
+
display_name: "QCEW Average Weekly Wage (USD)"
|
|
257
|
+
mergeability: not_mergeable
|
|
258
|
+
domain_flavors: ["cew_series"]
|
|
259
|
+
examples: [1526.0]
|
|
260
|
+
passive: true
|
|
261
|
+
|
|
262
|
+
# --- BLS-computed year-over-year percent change properties on cew_series ---
|
|
263
|
+
|
|
264
|
+
- name: "employment_yoy_pct_chg"
|
|
265
|
+
type: float
|
|
266
|
+
description: "Year-over-year percent change in third-month employment for the series, computed by BLS by comparing the current quarter against the same quarter of the prior year."
|
|
267
|
+
display_name: "QCEW Employment YoY %"
|
|
268
|
+
mergeability: not_mergeable
|
|
269
|
+
domain_flavors: ["cew_series"]
|
|
270
|
+
examples: [1.5, -0.4]
|
|
271
|
+
passive: true
|
|
272
|
+
|
|
273
|
+
- name: "monthly_employment_m1_yoy_pct_chg"
|
|
274
|
+
type: float
|
|
275
|
+
description: "Year-over-year percent change in first-month employment for the series."
|
|
276
|
+
display_name: "QCEW Employment (Month 1) YoY %"
|
|
277
|
+
mergeability: not_mergeable
|
|
278
|
+
domain_flavors: ["cew_series"]
|
|
279
|
+
examples: [1.4]
|
|
280
|
+
passive: true
|
|
281
|
+
|
|
282
|
+
- name: "monthly_employment_m2_yoy_pct_chg"
|
|
283
|
+
type: float
|
|
284
|
+
description: "Year-over-year percent change in second-month employment for the series."
|
|
285
|
+
display_name: "QCEW Employment (Month 2) YoY %"
|
|
286
|
+
mergeability: not_mergeable
|
|
287
|
+
domain_flavors: ["cew_series"]
|
|
288
|
+
examples: [1.4]
|
|
289
|
+
passive: true
|
|
290
|
+
|
|
291
|
+
- name: "establishments_yoy_pct_chg"
|
|
292
|
+
type: float
|
|
293
|
+
description: "Year-over-year percent change in establishment count for the series."
|
|
294
|
+
display_name: "QCEW Establishments YoY %"
|
|
295
|
+
mergeability: not_mergeable
|
|
296
|
+
domain_flavors: ["cew_series"]
|
|
297
|
+
examples: [1.2]
|
|
298
|
+
passive: true
|
|
299
|
+
|
|
300
|
+
- name: "total_quarterly_wages_yoy_pct_chg"
|
|
301
|
+
type: float
|
|
302
|
+
description: "Year-over-year percent change in total quarterly wages for the series."
|
|
303
|
+
display_name: "QCEW Total Wages YoY %"
|
|
304
|
+
mergeability: not_mergeable
|
|
305
|
+
domain_flavors: ["cew_series"]
|
|
306
|
+
examples: [5.7]
|
|
307
|
+
passive: true
|
|
308
|
+
|
|
309
|
+
- name: "avg_weekly_wage_yoy_pct_chg"
|
|
310
|
+
type: float
|
|
311
|
+
description: "Year-over-year percent change in average weekly wage for the series."
|
|
312
|
+
display_name: "QCEW Avg Weekly Wage YoY %"
|
|
313
|
+
mergeability: not_mergeable
|
|
314
|
+
domain_flavors: ["cew_series"]
|
|
315
|
+
examples: [4.2]
|
|
316
|
+
passive: true
|
|
317
|
+
|
|
318
|
+
- name: "taxable_quarterly_wages_yoy_pct_chg"
|
|
319
|
+
type: float
|
|
320
|
+
description: "Year-over-year percent change in taxable quarterly wages for the series."
|
|
321
|
+
display_name: "QCEW Taxable Wages YoY %"
|
|
322
|
+
mergeability: not_mergeable
|
|
323
|
+
domain_flavors: ["cew_series"]
|
|
324
|
+
examples: [3.3]
|
|
325
|
+
passive: true
|
|
326
|
+
|
|
327
|
+
- name: "quarterly_contributions_yoy_pct_chg"
|
|
328
|
+
type: float
|
|
329
|
+
description: "Year-over-year percent change in quarterly UI contributions for the series."
|
|
330
|
+
display_name: "QCEW UI Contributions YoY %"
|
|
331
|
+
mergeability: not_mergeable
|
|
332
|
+
domain_flavors: ["cew_series"]
|
|
333
|
+
examples: [4.3]
|
|
334
|
+
passive: true
|
|
335
|
+
|
|
336
|
+
relationships:
|
|
337
|
+
- name: "appears_in_cew_series"
|
|
338
|
+
description: "Links a real-world contextual entity (a US geographic area or a NAICS industry) to a QCEW time series whose area or industry dimension is that entity."
|
|
339
|
+
display_name: "Appears in QCEW Series"
|
|
340
|
+
mergeability: not_mergeable
|
|
341
|
+
domain_flavors: ["location", "industry"]
|
|
342
|
+
target_flavors: ["cew_series"]
|
|
343
|
+
passive: true
|
|
344
|
+
|
|
345
|
+
attributes: []
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# Data Dictionary: FIPS Codes (FCC mirror)
|
|
2
|
+
|
|
3
|
+
## Source Overview
|
|
4
|
+
|
|
5
|
+
Federal Information Processing Standard (FIPS) codes for U.S. states and
|
|
6
|
+
counties — short numeric identifiers issued by the federal government
|
|
7
|
+
(Census Bureau / NIST historically) that uniquely tag every U.S. state,
|
|
8
|
+
the District of Columbia, and every county-or-equivalent (boroughs,
|
|
9
|
+
parishes, independent cities, census areas).
|
|
10
|
+
|
|
11
|
+
- **Publisher (mirror):** U.S. Federal Communications Commission, Office
|
|
12
|
+
of Engineering and Technology
|
|
13
|
+
- **URL:** https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt
|
|
14
|
+
- **Format:** Plain-text fixed-column ASCII, ~3,200 lines total
|
|
15
|
+
- **Cadence:** Effectively static. The federal codes change only when
|
|
16
|
+
jurisdictions are created, dissolved, or renamed (rare — once every
|
|
17
|
+
several years).
|
|
18
|
+
- **Source name:** `fips`
|
|
19
|
+
|
|
20
|
+
The file contains two sections:
|
|
21
|
+
|
|
22
|
+
1. A 51-row table of **state-level FIPS codes** (50 states + DC), one
|
|
23
|
+
line per state with a 2-digit code.
|
|
24
|
+
2. A 3,140+ row table of **county-level FIPS codes**, one line per county
|
|
25
|
+
(or county-equivalent) with a 5-digit code. The leading 2 digits are
|
|
26
|
+
the parent state's FIPS code; the trailing 3 digits identify the
|
|
27
|
+
county within the state. The county-level table is grouped by state
|
|
28
|
+
and prefaced with a header row of the form `XX000 StateName`.
|
|
29
|
+
|
|
30
|
+
**Limitations:**
|
|
31
|
+
- The FCC mirror is a republication of the older Census-published list;
|
|
32
|
+
it does not include FIPS *places* (cities/towns), MSAs, or U.S.
|
|
33
|
+
territories (Puerto Rico, Guam, U.S. Virgin Islands, etc.).
|
|
34
|
+
- A small number of county lines have parenthesized historical
|
|
35
|
+
annotations (`(created after 1990)`, `(1990 Census Area)`,
|
|
36
|
+
`(After 1990, part of Halifax County)`). These are stripped from the
|
|
37
|
+
emitted county name.
|
|
38
|
+
- The `XX000` state header rows in the county section are duplicates of
|
|
39
|
+
the state-level table and are skipped during atomization (one record
|
|
40
|
+
per state, not two).
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## Entity Types
|
|
45
|
+
|
|
46
|
+
### `location`
|
|
47
|
+
|
|
48
|
+
Used for both U.S. states (and DC) and U.S. counties (and county-
|
|
49
|
+
equivalents like Alaska boroughs, Louisiana parishes, and Virginia
|
|
50
|
+
independent cities). The level of geography is distinguished by which
|
|
51
|
+
strong-ID property is set (`fips_state` for state-level, `fips_county`
|
|
52
|
+
for county-level) and by the `administrative_level` property.
|
|
53
|
+
|
|
54
|
+
- **Primary key (state-level):** the 2-digit FIPS state code, exposed as
|
|
55
|
+
the `fips_state` strong-ID property and as a property atom.
|
|
56
|
+
- **Primary key (county-level):** the 5-digit FIPS county code, exposed
|
|
57
|
+
as the `fips_county` strong-ID property and as a property atom.
|
|
58
|
+
- **Entity resolver:** named entity, **MERGEABLE**. State and county FIPS
|
|
59
|
+
codes are stable, official identifiers and merging across sources
|
|
60
|
+
(Census, FRED, sanctions data, etc.) is desired. Disambiguation
|
|
61
|
+
snippets include the formatted name (e.g. `"Autauga County, Alabama"`).
|
|
62
|
+
- **Name format:**
|
|
63
|
+
- State-level: `Title-cased state name` (e.g. `"Alabama"`,
|
|
64
|
+
`"District of Columbia"`).
|
|
65
|
+
- County-level: `"{County} County, {State}"` for the common case;
|
|
66
|
+
parishes / boroughs / cities / census areas keep their original
|
|
67
|
+
suffix (e.g. `"East Baton Rouge Parish, Louisiana"`,
|
|
68
|
+
`"Aleutians East Borough, Alaska"`,
|
|
69
|
+
`"Baltimore city, Maryland"`).
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Properties
|
|
74
|
+
|
|
75
|
+
The dataset uses the DataSchema `namespace: fips`. Atom property keys
|
|
76
|
+
are `fips::<local_name>` for source-specific properties. Identity
|
|
77
|
+
properties also used for resolver strong IDs are `fips_state` and
|
|
78
|
+
`fips_county`.
|
|
79
|
+
|
|
80
|
+
### Common Properties (states and counties)
|
|
81
|
+
|
|
82
|
+
* `fips::administrative_level`
|
|
83
|
+
* Definition: Granularity of the geographic entity within the U.S.
|
|
84
|
+
federal hierarchy.
|
|
85
|
+
* Examples: `"state"`, `"county"`
|
|
86
|
+
* Derivation: Set to `"state"` for entries from the state-level table,
|
|
87
|
+
`"county"` for entries from the county-level table.
|
|
88
|
+
|
|
89
|
+
* `fips::official_name`
|
|
90
|
+
* Definition: Verbatim place name as it appears in the FCC mirror,
|
|
91
|
+
upper-cased for states and mixed-case for counties.
|
|
92
|
+
* Examples: `"ALABAMA"`, `"Autauga County"`,
|
|
93
|
+
`"Aleutians East Borough"`, `"East Baton Rouge Parish"`,
|
|
94
|
+
`"Baltimore city"`
|
|
95
|
+
* Derivation: The "place name" column of the source file, with the
|
|
96
|
+
parenthesized historical annotation (when present) stripped.
|
|
97
|
+
|
|
98
|
+
### State Properties
|
|
99
|
+
|
|
100
|
+
* `fips_state`
|
|
101
|
+
* Definition: Two-digit Federal Information Processing Standard code
|
|
102
|
+
that uniquely identifies a U.S. state or the District of Columbia.
|
|
103
|
+
* Examples: `"01"` (Alabama), `"06"` (California), `"11"` (DC)
|
|
104
|
+
* Derivation: Verbatim from the state-level table's first column,
|
|
105
|
+
zero-padded to two digits.
|
|
106
|
+
* Note: Also used as the strong ID on the state's `location` entity.
|
|
107
|
+
|
|
108
|
+
### County Properties
|
|
109
|
+
|
|
110
|
+
* `fips_county`
|
|
111
|
+
* Definition: Five-digit Federal Information Processing Standard code
|
|
112
|
+
that uniquely identifies a U.S. county or county-equivalent. The
|
|
113
|
+
leading two digits are the parent state's `fips_state` code; the
|
|
114
|
+
trailing three digits identify the county within the state.
|
|
115
|
+
* Examples: `"01001"` (Autauga County, Alabama), `"06037"` (Los
|
|
116
|
+
Angeles County, California), `"22033"` (East Baton Rouge Parish,
|
|
117
|
+
Louisiana)
|
|
118
|
+
* Derivation: Verbatim from the county-level table's first column,
|
|
119
|
+
zero-padded to five digits.
|
|
120
|
+
* Note: Also used as the strong ID on the county's `location` entity.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Entity Relationships Summary
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
location (county) ──[located_in]──→ location (state)
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
- `located_in`: Each county-level `location` is linked to its parent
|
|
131
|
+
state-level `location` via the leading two digits of the county FIPS
|
|
132
|
+
code. Both sides carry strong IDs (`fips_county` and `fips_state`),
|
|
133
|
+
which guarantees resolver merging into a single state node across all
|
|
134
|
+
county-→-state edges and across other datasets that emit the same
|
|
135
|
+
state codes.
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Dataset schema for the FIPS Codes (Federal Information Processing
|
|
2
|
+
# Standard) source — the FCC mirror of the Census Bureau's two-digit
|
|
3
|
+
# state and five-digit county FIPS code list.
|
|
4
|
+
#
|
|
5
|
+
# Source: https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt
|
|
6
|
+
# Cadence: effectively static (codes change once every several years).
|
|
7
|
+
#
|
|
8
|
+
# This schema describes U.S. states and counties as `location` entities
|
|
9
|
+
# identified by their FIPS strong IDs, along with a containment
|
|
10
|
+
# relationship from each county to its parent state.
|
|
11
|
+
name: "fips"
|
|
12
|
+
description: "U.S. state and county Federal Information Processing Standard (FIPS) codes from the FCC mirror of the Census Bureau list, modelled as `location` entities with FIPS strong IDs"
|
|
13
|
+
|
|
14
|
+
extraction:
|
|
15
|
+
flavors: closed
|
|
16
|
+
properties: closed
|
|
17
|
+
relationships: closed
|
|
18
|
+
attributes: closed
|
|
19
|
+
events: closed
|
|
20
|
+
|
|
21
|
+
flavors:
|
|
22
|
+
- name: "location"
|
|
23
|
+
description: "A specific named geographic location such as a city, country, region, or landmark"
|
|
24
|
+
display_name: "Location"
|
|
25
|
+
mergeability: not_mergeable
|
|
26
|
+
strong_id_properties: ["fips_state", "fips_county"]
|
|
27
|
+
examples: ["New York City", "San Francisco", "North America", "Bakery Square"]
|
|
28
|
+
passive: true
|
|
29
|
+
|
|
30
|
+
properties:
|
|
31
|
+
- name: "fips_state"
|
|
32
|
+
type: string
|
|
33
|
+
description: "Two-digit Federal Information Processing Standard code that uniquely identifies a U.S. state or the District of Columbia"
|
|
34
|
+
display_name: "State FIPS Code"
|
|
35
|
+
mergeability: not_mergeable
|
|
36
|
+
domain_flavors: ["location"]
|
|
37
|
+
examples: ["01", "06", "11", "36"]
|
|
38
|
+
passive: true
|
|
39
|
+
|
|
40
|
+
- name: "fips_county"
|
|
41
|
+
type: string
|
|
42
|
+
description: "Five-digit Federal Information Processing Standard code that uniquely identifies a U.S. county or county-equivalent; the leading two digits are the parent state's FIPS code"
|
|
43
|
+
display_name: "County FIPS Code"
|
|
44
|
+
mergeability: not_mergeable
|
|
45
|
+
domain_flavors: ["location"]
|
|
46
|
+
examples: ["01001", "06037", "22033", "51790"]
|
|
47
|
+
passive: true
|
|
48
|
+
|
|
49
|
+
- name: "administrative_level"
|
|
50
|
+
namespace: "fips"
|
|
51
|
+
type: string
|
|
52
|
+
description: "Granularity of the geographic entity within the U.S. federal hierarchy of states and counties"
|
|
53
|
+
display_name: "Administrative Level"
|
|
54
|
+
mergeability: not_mergeable
|
|
55
|
+
domain_flavors: ["location"]
|
|
56
|
+
examples: ["state", "county"]
|
|
57
|
+
passive: true
|
|
58
|
+
|
|
59
|
+
- name: "official_name"
|
|
60
|
+
namespace: "fips"
|
|
61
|
+
type: string
|
|
62
|
+
description: "Verbatim place name as published in the FCC mirror of the FIPS code list, with parenthesized historical annotations stripped"
|
|
63
|
+
display_name: "Official FIPS Name"
|
|
64
|
+
mergeability: not_mergeable
|
|
65
|
+
domain_flavors: ["location"]
|
|
66
|
+
examples: ["ALABAMA", "Autauga County", "Aleutians East Borough", "East Baton Rouge Parish"]
|
|
67
|
+
passive: true
|
|
68
|
+
|
|
69
|
+
relationships:
|
|
70
|
+
- name: "located_in"
|
|
71
|
+
description: "Administrative territory or location the entity is situated in (Wikidata P131, P276)"
|
|
72
|
+
display_name: "Located In"
|
|
73
|
+
mergeability: not_mergeable
|
|
74
|
+
domain_flavors: ["location"]
|
|
75
|
+
target_flavors: ["location"]
|
|
76
|
+
examples: ["Autauga County is located in Alabama", "Los Angeles County is located in California"]
|
|
77
|
+
passive: true
|