@yottagraph-app/data-model-skill 0.0.35 → 0.0.37

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@yottagraph-app/data-model-skill",
3
- "version": "0.0.35",
3
+ "version": "0.0.37",
4
4
  "description": "Data model skill documentation for AI agents - entity types, properties, and schemas from Lovelace fetch sources",
5
5
  "repository": {
6
6
  "type": "git",
@@ -0,0 +1,389 @@
1
+ # Data Dictionary: BLS CEW (QCEW)
2
+
3
+ ## Purpose
4
+
5
+ This dictionary documents the entity types, properties, and attributes
6
+ that the BLS Quarterly Census of Employment and Wages (QCEW) source
7
+ contributes to the Lovelace knowledge graph. It is the contract between
8
+ the source and downstream consumers (ingest, query server, UI).
9
+
10
+ QCEW is a quarterly count of employment and wages reported by employers,
11
+ covering more than 95 % of US jobs. It is published by the U.S. Bureau
12
+ of Labor Statistics for every state, MSA, and county, broken out by
13
+ ownership (federal/state/local/private) and by industry (NAICS), at the
14
+ sector / 3-digit / 4-digit / 5-digit / 6-digit detail levels. The
15
+ Lovelace QCEW source ingests the published quarterly data slices —
16
+ specifically the by-area CSV slices for the United States as a whole
17
+ plus all 50 states and DC — and emits one record per area × ownership ×
18
+ industry × quarter combination.
19
+
20
+ **Pipeline:** Download → Extract → Atomize.
21
+ - Download fetches the per-area CSV slice for each (area, year, quarter)
22
+ from `https://data.bls.gov/cew/data/api/{year}/{quarter}/area/{area_fips}.csv`.
23
+ - Extract is a pass-through (the raw CSV is the structured input).
24
+ - Atomize parses each CSV row into KG records.
25
+
26
+ **Cadence:** BLS publishes each quarter's QCEW release roughly 5–7
27
+ months after the close of the quarter (Q1 in late Aug/Sep, Q2 in late
28
+ Nov/Dec, Q3 in early Mar of the next year, Q4 in early Jun). The
29
+ streamer polls weekly; new quarters are detected and atomized when they
30
+ appear.
31
+
32
+ **Disclosure suppression.** Rows with `disclosure_code == "N"` are
33
+ withheld by BLS to protect employer confidentiality (typically when an
34
+ industry has very few establishments in an area). All numeric values on
35
+ those rows are zero. The atomizer drops "N"-disclosed rows entirely so
36
+ the KG never contains zero-valued QCEW observations that would be
37
+ mistaken for real data.
38
+
39
+ **Series identity.** A QCEW "series" is uniquely identified by the
40
+ 4-tuple (area_fips, own_code, industry_code, agglvl_code). The atomizer
41
+ constructs a synthetic series id of the form
42
+ `{area_fips}.{own_code}.{industry_code}.{agglvl_code}` and emits this
43
+ as the `cew_series_id` strong id on a `cew_series` entity. `size_code`
44
+ is always `0` for the quarterly area slices ingested today (size-class
45
+ slices are an annual-only artifact and are out of scope for v1).
46
+
47
+ **Source name:** `blscew-source`
48
+
49
+ ---
50
+
51
+ ## Entity Types
52
+
53
+ ### `cew_series`
54
+
55
+ A single QCEW time series — the unique combination of geographic area,
56
+ ownership, industry, and aggregation level — that carries quarterly
57
+ employment, wages, and establishment metrics over time.
58
+
59
+ - Primary key: `cew_series_id` (synthetic id, see above) used as the
60
+ strong id for resolution.
61
+ - Entity resolver: named entity, NOT_MERGEABLE. The strong id is
62
+ `cew_series_id`. The disambiguation snippet includes area title,
63
+ ownership title, industry title, and aggregation level title.
64
+ - Source: `blscew-source`
65
+ - Examples produced: `US000.0.10.10` (US national, total ownership, all
66
+ industries), `06000.5.31-33.55` (California, private ownership,
67
+ Manufacturing supersector), `06037.0.5111.74` (Los Angeles County,
68
+ total ownership, NAICS 5111 Newspaper / Periodical Publishers).
69
+
70
+ ### `location`
71
+
72
+ A US geographic area (national, state, MSA, or county) for which BLS
73
+ publishes QCEW data, identified by its FIPS-based `area_fips` code.
74
+
75
+ - Primary key: `area_fips` strong id.
76
+ - Entity resolver: named entity, NOT_MERGEABLE. Strong id is
77
+ `area_fips`. Snippet includes the area title and the area type
78
+ (national / statewide / MSA / county).
79
+ - Source: `blscew-source`
80
+ - Examples produced: `US000` (U.S. TOTAL), `06000` (California -- Statewide),
81
+ `06037` (Los Angeles County, California), `C1018` (Albany-Schenectady-Troy,
82
+ NY MSA).
83
+
84
+ ### `industry`
85
+
86
+ An economic activity category from the North American Industry
87
+ Classification System (NAICS), or a BLS-defined supersector that rolls
88
+ up multiple NAICS sectors.
89
+
90
+ - Primary key: `naics_code` strong id (the BLS QCEW `industry_code`,
91
+ which can be a NAICS sector / 3-digit / 4-digit / 5-digit / 6-digit
92
+ code, a 2-digit BLS supersector aggregate (e.g. `31-33` for
93
+ Manufacturing), or the special aggregate `10` meaning "Total, all
94
+ industries").
95
+ - Entity resolver: named entity, NOT_MERGEABLE. Strong id is
96
+ `naics_code`. Snippet includes the industry title.
97
+ - Source: `blscew-source`
98
+ - Examples produced: `10` (Total, all industries), `31-33`
99
+ (Manufacturing), `5111` (Newspaper, Periodical, Book, and Directory
100
+ Publishers), `541` (Professional, Scientific, and Technical Services).
101
+
102
+ ---
103
+
104
+ ## Properties
105
+
106
+ ### Identity & Metadata Properties (cew_series)
107
+
108
+ These atoms appear once per series, timestamped at the first quarter
109
+ the series is observed in the current run.
110
+
111
+ * `cew_series_id`
112
+ * Definition: synthetic identifier for a QCEW series, built by
113
+ joining the area, ownership, industry, and aggregation-level codes
114
+ with dots.
115
+ * Examples: `"US000.0.10.10"`, `"06000.5.31-33.55"`,
116
+ `"06037.0.5111.74"`
117
+ * Derivation: built by the atomizer from the QCEW `area_fips`,
118
+ `own_code`, `industry_code`, and `agglvl_code` fields of each CSV
119
+ row.
120
+
121
+ * `name`
122
+ * Definition: human-readable label for the series, combining area,
123
+ ownership, and industry titles.
124
+ * Examples: `"U.S. TOTAL · Total Covered · Total, all industries"`,
125
+ `"California · Private · Manufacturing"`
126
+ * Derivation: built from the `area_title`, `own_title`, and
127
+ `industry_title` fields on the CSV row.
128
+
129
+ * `area_fips`
130
+ * Definition: BLS-assigned 5-character area identifier (FIPS-based).
131
+ Acts as strong id on the `location` entity that the series points
132
+ to.
133
+ * Examples: `"US000"`, `"06000"`, `"06037"`, `"C1018"`
134
+ * Derivation: `area_fips` field of the CSV row.
135
+
136
+ * `area_title`
137
+ * Definition: human-readable title for the geographic area.
138
+ * Examples: `"U.S. TOTAL"`, `"California -- Statewide"`,
139
+ `"Los Angeles County, California"`,
140
+ `"Albany-Schenectady-Troy, NY MSA"`
141
+ * Derivation: `area_title` field of the CSV row.
142
+
143
+ * `ownership_code`
144
+ * Definition: BLS one-character code identifying the ownership
145
+ sector covered by the series.
146
+ * Examples: `"0"` (Total Covered), `"1"` (Federal Government),
147
+ `"2"` (State Government), `"3"` (Local Government),
148
+ `"5"` (Private)
149
+ * Derivation: `own_code` field of the CSV row.
150
+
151
+ * `ownership_title`
152
+ * Definition: human-readable label for the ownership sector.
153
+ * Examples: `"Total Covered"`, `"Federal Government"`,
154
+ `"State Government"`, `"Local Government"`, `"Private"`
155
+ * Derivation: `own_title` field of the CSV row.
156
+
157
+ * `naics_code`
158
+ * Definition: industry code used by BLS — NAICS at varying levels of
159
+ aggregation, plus BLS supersector aggregates and the special
160
+ "Total, all industries" code `10`. Acts as strong id on the
161
+ `industry` entity that the series points to.
162
+ * Examples: `"10"`, `"31-33"`, `"5111"`, `"541211"`
163
+ * Derivation: `industry_code` field of the CSV row.
164
+
165
+ * `naics_description`
166
+ * Definition: human-readable name of the industry / supersector.
167
+ * Examples: `"Total, all industries"`, `"Manufacturing"`,
168
+ `"Newspaper, periodical, book and directory publishers"`
169
+ * Derivation: `industry_title` field of the CSV row.
170
+
171
+ * `aggregation_level_code`
172
+ * Definition: BLS two-character code describing the geographic and
173
+ industry aggregation level the series represents (e.g. national
174
+ total, statewide ownership × supersector, county × 6-digit NAICS).
175
+ * Examples: `"10"` (national, by ownership × total), `"55"`
176
+ (statewide, by ownership × supersector),
177
+ `"74"` (county, by ownership × 5-digit NAICS),
178
+ `"78"` (county, by ownership × 6-digit NAICS)
179
+ * Derivation: `agglvl_code` field of the CSV row.
180
+
181
+ * `aggregation_level_title`
182
+ * Definition: human-readable description of the aggregation level.
183
+ * Examples: `"National, by ownership sector"`,
184
+ `"Statewide, by ownership sector and supersector"`,
185
+ `"County, by ownership sector and 6-digit NAICS"`
186
+ * Derivation: `agglvl_title` field of the CSV row.
187
+
188
+ * `area_type`
189
+ * Definition: classification of the area as one of `national`,
190
+ `statewide`, `msa`, or `county` derived from the leading characters
191
+ of `area_fips` and the trailing zeros pattern.
192
+ * Examples: `"national"`, `"statewide"`, `"county"`, `"msa"`
193
+ * Derivation: heuristic on `area_fips`: `US000` → national; FIPS
194
+ starting with `C` → MSA; 5-digit ending in `000` → statewide; any
195
+ other 5-digit code → county.
196
+
197
+ * `naics_level`
198
+ * Definition: granularity of the NAICS code on this row, where
199
+ higher values are more specific. Stored as a float to keep with
200
+ the schema's float-for-numeric convention.
201
+ * Examples: `2.0` (sector or supersector), `3.0` (subsector),
202
+ `4.0` (industry group), `5.0` (NAICS industry), `6.0` (national
203
+ industry).
204
+ * Derivation: length of the `industry_code` string, with the special
205
+ case `10` (Total) and any 2-character supersector code mapped to
206
+ `2.0`.
207
+
208
+ * `publisher`
209
+ * Definition: organization that publishes the data (always BLS).
210
+ * Examples: `"U.S. Bureau of Labor Statistics"`
211
+ * Derivation: hard-coded constant; QCEW is exclusively a BLS product.
212
+
213
+ ### Quarterly Observation Properties (cew_series)
214
+
215
+ These atoms appear on the per-quarter observation records, timestamped
216
+ at the last day of the quarter (e.g. 2024-03-31 for Q1 2024).
217
+
218
+ * `establishment_count`
219
+ * Definition: count of establishments (physical locations of
220
+ employers) covered by the series in the quarter.
221
+ * Examples: `11907855` (US total Q1 2024), `61375` (US Federal
222
+ Government total Q1 2024)
223
+ * Derivation: `qtrly_estabs` field of the CSV row.
224
+
225
+ * `monthly_employment_m1`, `monthly_employment_m2`, `monthly_employment_m3`
226
+ * Definition: number of employees on payrolls covered in the first /
227
+ second / third month of the quarter (BLS uses the pay period
228
+ including the 12th of the month).
229
+ * Examples (US Q1 2024 total): `152393725`, `153129544`, `153848430`
230
+ * Derivation: `month1_emplvl`, `month2_emplvl`, `month3_emplvl`
231
+ fields of the CSV row.
232
+
233
+ * `employment_level`
234
+ * Definition: representative quarterly employment level for the
235
+ series, taken as the third-month employment (final month of the
236
+ quarter), aligned to the way QCEW publications report "QCEW
237
+ employment".
238
+ * Examples: `153848430` (US Q1 2024 total)
239
+ * Derivation: `month3_emplvl` field of the CSV row.
240
+
241
+ * `total_quarterly_wages`
242
+ * Definition: total wages paid (in current US dollars) to all covered
243
+ workers during the quarter.
244
+ * Examples: `3037790324790` (US Q1 2024 total = $3.04 trillion)
245
+ * Derivation: `total_qtrly_wages` field of the CSV row.
246
+
247
+ * `taxable_quarterly_wages`
248
+ * Definition: portion of total quarterly wages subject to UI tax
249
+ contributions (in current US dollars). Always 0 for federal
250
+ government employment, which is not subject to UI tax.
251
+ * Examples: `1151875077520` (US Q1 2024 total)
252
+ * Derivation: `taxable_qtrly_wages` field of the CSV row.
253
+
254
+ * `quarterly_contributions`
255
+ * Definition: total UI tax contributions (in current US dollars)
256
+ associated with this employment in the quarter.
257
+ * Examples: `19555346530` (US Q1 2024 total)
258
+ * Derivation: `qtrly_contributions` field of the CSV row.
259
+
260
+ * `avg_weekly_wage`
261
+ * Definition: average weekly wage (in current US dollars) per
262
+ employee covered in the quarter, computed by BLS as
263
+ `total_qtrly_wages / (avg_emplvl × 13)`.
264
+ * Examples: `1526` (US Q1 2024 total = $1,526/week)
265
+ * Derivation: `avg_wkly_wage` field of the CSV row.
266
+
267
+ ### Year-over-Year Change Properties (cew_series)
268
+
269
+ BLS pre-computes over-the-year (OTY) absolute and percent changes for
270
+ each quarterly metric, comparing the current quarter to the same quarter
271
+ of the prior year. The atomizer emits the percent-change variants as
272
+ their own atoms so downstream consumers can directly query
273
+ "Employment growth YoY" without recomputing it from observation history.
274
+
275
+ * `employment_yoy_pct_chg`
276
+ * Definition: year-over-year percent change in the canonical
277
+ `employment_level` (= third-month / end-of-quarter snapshot).
278
+ * Examples: `1.5` (US total Q1 2024 vs Q1 2023 = +1.5 %)
279
+ * Derivation: `oty_month3_emplvl_pct_chg` field of the CSV row. We
280
+ intentionally do *not* also emit `monthly_employment_m3_yoy_pct_chg`
281
+ -- it would duplicate this number under a redundant property name.
282
+
283
+ * `monthly_employment_m1_yoy_pct_chg`, `monthly_employment_m2_yoy_pct_chg`
284
+ * Definition: year-over-year percent change in mid-quarter monthly
285
+ employment (the M1 and M2 snapshots that have no canonical alias;
286
+ the M3 snapshot is exposed as `employment_yoy_pct_chg` above).
287
+ * Examples (US Q1 2024 total): `1.4`, `1.4`
288
+ * Derivation: `oty_month1_emplvl_pct_chg`, `oty_month2_emplvl_pct_chg`,
289
+ `oty_month3_emplvl_pct_chg` fields of the CSV row.
290
+
291
+ * `establishments_yoy_pct_chg`
292
+ * Definition: year-over-year percent change in the count of
293
+ establishments.
294
+ * Examples: `1.2` (US Q1 2024 total = +1.2 %)
295
+ * Derivation: `oty_qtrly_estabs_pct_chg` field of the CSV row.
296
+
297
+ * `total_quarterly_wages_yoy_pct_chg`
298
+ * Definition: year-over-year percent change in total quarterly
299
+ wages.
300
+ * Examples: `5.7` (US Q1 2024 total = +5.7 %)
301
+ * Derivation: `oty_total_qtrly_wages_pct_chg` field of the CSV row.
302
+
303
+ * `avg_weekly_wage_yoy_pct_chg`
304
+ * Definition: year-over-year percent change in the average weekly
305
+ wage.
306
+ * Examples: `4.2` (US Q1 2024 total = +4.2 %)
307
+ * Derivation: `oty_avg_wkly_wage_pct_chg` field of the CSV row.
308
+
309
+ * `taxable_quarterly_wages_yoy_pct_chg`
310
+ * Definition: year-over-year percent change in taxable quarterly
311
+ wages.
312
+ * Examples: `3.3`
313
+ * Derivation: `oty_taxable_qtrly_wages_pct_chg` field of the CSV row.
314
+
315
+ * `quarterly_contributions_yoy_pct_chg`
316
+ * Definition: year-over-year percent change in UI contributions.
317
+ * Examples: `4.3`
318
+ * Derivation: `oty_qtrly_contributions_pct_chg` field of the CSV row.
319
+
320
+ ### Location Properties (location)
321
+
322
+ * `area_fips`
323
+ * Definition: BLS-assigned area identifier (FIPS-based) used as the
324
+ location's strong id.
325
+ * Examples: `"US000"`, `"06000"`, `"06037"`, `"C1018"`
326
+ * Derivation: `area_fips` field of the CSV row.
327
+
328
+ * `name`
329
+ * Definition: human-readable name of the geographic area.
330
+ * Examples: `"U.S. TOTAL"`, `"California -- Statewide"`,
331
+ `"Los Angeles County, California"`
332
+ * Derivation: `area_title` field of the CSV row.
333
+
334
+ * `area_type`
335
+ * Definition: classification of the area as `national`, `statewide`,
336
+ `msa`, or `county`.
337
+ * Examples: `"national"`, `"statewide"`, `"county"`, `"msa"`
338
+ * Derivation: heuristic on `area_fips`; see the `cew_series` entry of
339
+ the same name above.
340
+
341
+ ### Industry Properties (industry)
342
+
343
+ * `naics_code`
344
+ * Definition: industry code used by BLS (NAICS at sector / 3-digit /
345
+ 4-digit / 5-digit / 6-digit detail, plus BLS supersector aggregates
346
+ and `10` = Total, all industries). Strong id for `industry`.
347
+ * Examples: `"10"`, `"31-33"`, `"5111"`, `"541211"`
348
+ * Derivation: `industry_code` field of the CSV row.
349
+
350
+ * `naics_description`
351
+ * Definition: human-readable industry name.
352
+ * Examples: `"Total, all industries"`, `"Manufacturing"`,
353
+ `"Newspaper, periodical, book and directory publishers"`
354
+ * Derivation: `industry_title` field of the CSV row.
355
+
356
+ * `naics_level`
357
+ * Definition: granularity of the NAICS code (see the `cew_series`
358
+ entry of the same name above for derivation).
359
+ * Examples: `2.0`, `3.0`, `4.0`, `5.0`, `6.0`
360
+
361
+ ---
362
+
363
+ ## Entity Relationships Summary
364
+
365
+ The QCEW source emits two relationship types — both pointing from the
366
+ real-world contextual entity (a US area or an industry) to the
367
+ `cew_series` it appears in. This mirrors the FRED source's
368
+ `appears_in_fred_series` pattern.
369
+
370
+ ```
371
+ location ──[appears_in_cew_series]──→ cew_series
372
+ industry ──[appears_in_cew_series]──→ cew_series
373
+ ```
374
+
375
+ * `appears_in_cew_series`
376
+ * Definition: the subject (a US geographic area or an industry)
377
+ appears as the area / industry dimension of a QCEW time series.
378
+ * Domain flavors: `location`, `industry`
379
+ * Target flavor: `cew_series`
380
+ * Derivation: emitted once per series per quarter on the location and
381
+ industry context records.
382
+
383
+ ---
384
+
385
+ ## Attributes
386
+
387
+ None. All quarterly metrics are timestamped scalar atoms; there are no
388
+ per-atom attributes (unit / frequency / etc.) on this source — those are
389
+ carried as atoms on the `cew_series` metadata record.
@@ -0,0 +1,345 @@
1
+ # Dataset schema for BLS QCEW (Quarterly Census of Employment and Wages).
2
+ #
3
+ # Architecture mirrors FRED:
4
+ # For each unique QCEW series — the (area_fips, own_code, industry_code,
5
+ # agglvl_code) tuple — the atomizer emits:
6
+ # 1. A metadata record: subject is the cew_series entity, atoms are
7
+ # identifying metadata (cew_series_id, area_fips, area_title,
8
+ # ownership_code/title, naics_code/description, aggregation level,
9
+ # publisher, etc.). Timestamped at the quarter end date.
10
+ # 2. One quarterly observation record per (series, quarter): subject is
11
+ # the cew_series, atoms are all quarterly metrics — establishment
12
+ # count, employment levels, wages, year-over-year percent changes —
13
+ # all timestamped at the quarter end date.
14
+ # 3. A location context record: subject is the geographic area
15
+ # (location flavor), single appears_in_cew_series atom pointing at
16
+ # the cew_series.
17
+ # 4. An industry context record: subject is the NAICS industry
18
+ # (industry flavor), single appears_in_cew_series atom pointing at
19
+ # the cew_series.
20
+ #
21
+ # All elements are passive — created by the atomizer from QCEW CSV data,
22
+ # not by LLM extraction.
23
+ #
24
+ # Source identifier used on all records: "blscew-source"
25
+ name: "blscew"
26
+ description: "Quarterly Census of Employment and Wages from the U.S. Bureau of Labor Statistics — quarterly establishment counts, employment levels, total and taxable wages, average weekly wages, and BLS-computed year-over-year changes by US area, ownership sector, and NAICS industry"
27
+
28
+ extraction:
29
+ flavors: closed
30
+ properties: closed
31
+ relationships: closed
32
+ attributes: closed
33
+ events: closed
34
+
35
+ flavors:
36
+ - name: "cew_series"
37
+ description: "A QCEW (Quarterly Census of Employment and Wages) time series identified by the unique combination of area, ownership, industry, and aggregation level. The entity carries metadata atoms and per-quarter observations of employment, wages, and establishment counts."
38
+ display_name: "QCEW Series"
39
+ mergeability: not_mergeable
40
+ strong_id_properties: ["cew_series_id"]
41
+ passive: true
42
+
43
+ - name: "location"
44
+ description: "A specific named geographic location such as a city, country, region, or landmark"
45
+ display_name: "Location"
46
+ mergeability: not_mergeable
47
+ strong_id_properties: ["area_fips"]
48
+ passive: true
49
+
50
+ - name: "industry"
51
+ description: "An industry classification of economic activity (e.g. NAICS or SIC) identifying the line of business associated with an organization or award"
52
+ display_name: "Industry"
53
+ mergeability: not_mergeable
54
+ strong_id_properties: ["naics_code"]
55
+ passive: true
56
+
57
+ properties:
58
+ # --- Strong-id properties ---
59
+
60
+ - name: "cew_series_id"
61
+ type: string
62
+ description: "Synthetic identifier for a QCEW series formed by joining area_fips, own_code, industry_code, and agglvl_code with dots. Strong id for cew_series entities."
63
+ display_name: "QCEW Series ID"
64
+ mergeability: not_mergeable
65
+ domain_flavors: ["cew_series"]
66
+ examples: ["US000.0.10.10", "06000.5.31-33.55", "06037.0.5111.74"]
67
+ passive: true
68
+
69
+ - name: "area_fips"
70
+ type: string
71
+ description: "BLS-assigned 5-character QCEW area identifier (FIPS-based). For US states the value is the 2-digit FIPS state code followed by '000' (e.g. '06000' = California). For counties it is the 5-digit county FIPS (e.g. '06037' = Los Angeles County). For MSAs the BLS prefixes a 'C' (e.g. 'C1018' = Albany–Schenectady–Troy MSA). Strong id for the location entity that a cew_series is attached to."
72
+ display_name: "QCEW Area FIPS"
73
+ mergeability: not_mergeable
74
+ domain_flavors: ["cew_series", "location"]
75
+ examples: ["US000", "06000", "06037", "C1018"]
76
+ passive: true
77
+
78
+ - name: "naics_code"
79
+ type: string
80
+ description: "North American Industry Classification System code (typically 6 digits) identifying the industry of work performed under a contract (e.g., \"524114\" for Direct Health and Medical Insurance Carriers)"
81
+ display_name: "NAICS code"
82
+ mergeability: not_mergeable
83
+ domain_flavors: ["cew_series", "industry"]
84
+ examples: ["10", "31-33", "5111", "541211"]
85
+ passive: true
86
+
87
+ # --- Identity / metadata properties on cew_series ---
88
+
89
+ - name: "name"
90
+ type: string
91
+ description: "Display name of the entity"
92
+ display_name: "Name"
93
+ mergeability: not_mergeable
94
+ domain_flavors: ["cew_series", "location"]
95
+ examples: ["California -- Statewide", "U.S. TOTAL · Total Covered · Total, all industries"]
96
+ passive: true
97
+
98
+ - name: "area_title"
99
+ type: string
100
+ description: "Human-readable title for a QCEW geographic area (national, state, MSA, or county) as published by BLS."
101
+ display_name: "QCEW Area Title"
102
+ mergeability: not_mergeable
103
+ domain_flavors: ["cew_series"]
104
+ examples: ["U.S. TOTAL", "California -- Statewide", "Los Angeles County, California", "Albany-Schenectady-Troy, NY MSA"]
105
+ passive: true
106
+
107
+ - name: "ownership_code"
108
+ type: string
109
+ description: "BLS one-character QCEW ownership-sector code: 0 = Total Covered, 1 = Federal Government, 2 = State Government, 3 = Local Government, 5 = Private."
110
+ display_name: "QCEW Ownership Code"
111
+ mergeability: not_mergeable
112
+ domain_flavors: ["cew_series"]
113
+ examples: ["0", "1", "2", "3", "5"]
114
+ passive: true
115
+
116
+ - name: "ownership_title"
117
+ type: string
118
+ description: "Human-readable QCEW ownership-sector title (Total Covered, Federal Government, State Government, Local Government, Private)."
119
+ display_name: "QCEW Ownership Title"
120
+ mergeability: not_mergeable
121
+ domain_flavors: ["cew_series"]
122
+ examples: ["Total Covered", "Federal Government", "State Government", "Local Government", "Private"]
123
+ passive: true
124
+
125
+ - name: "naics_description"
126
+ type: string
127
+ description: "Human-readable name of the NAICS industry (e.g., \"DIRECT HEALTH AND MEDICAL INSURANCE CARRIERS\")"
128
+ display_name: "NAICS description"
129
+ mergeability: not_mergeable
130
+ domain_flavors: ["cew_series", "industry"]
131
+ examples: ["Total, all industries", "Manufacturing", "Newspaper, periodical, book and directory publishers"]
132
+ passive: true
133
+
134
+ - name: "aggregation_level_code"
135
+ type: string
136
+ description: "BLS QCEW two-character aggregation-level code describing the geographic and industry granularity of a series — for example national vs statewide vs county, and total vs supersector vs detailed NAICS."
137
+ display_name: "QCEW Aggregation Level Code"
138
+ mergeability: not_mergeable
139
+ domain_flavors: ["cew_series"]
140
+ examples: ["10", "55", "74", "78"]
141
+ passive: true
142
+
143
+ - name: "aggregation_level_title"
144
+ type: string
145
+ description: "Human-readable description of a QCEW aggregation level (e.g. \"National, by ownership sector\", \"County, by ownership sector and 6-digit NAICS\")."
146
+ display_name: "QCEW Aggregation Level Title"
147
+ mergeability: not_mergeable
148
+ domain_flavors: ["cew_series"]
149
+ examples: ["National, by ownership sector", "Statewide, by ownership sector and supersector", "County, by ownership sector and 6-digit NAICS"]
150
+ passive: true
151
+
152
+ - name: "area_type"
153
+ type: string
154
+ description: "Classification of a QCEW area as one of national, statewide, msa, or county, derived from the shape of the area_fips code."
155
+ display_name: "QCEW Area Type"
156
+ mergeability: not_mergeable
157
+ domain_flavors: ["cew_series", "location"]
158
+ examples: ["national", "statewide", "msa", "county"]
159
+ passive: true
160
+
161
+ - name: "naics_level"
162
+ type: float
163
+ description: "Granularity of the NAICS code on a QCEW series, expressed as the number of digits. 2.0 = sector or BLS supersector aggregate, 3.0 = subsector, 4.0 = industry group, 5.0 = NAICS industry, 6.0 = national industry."
164
+ display_name: "QCEW NAICS Level"
165
+ mergeability: not_mergeable
166
+ domain_flavors: ["cew_series", "industry"]
167
+ examples: [2.0, 3.0, 4.0, 5.0, 6.0]
168
+ passive: true
169
+
170
+ - name: "publisher"
171
+ type: string
172
+ description: "Name of the organization that publishes or maintains the data series, inferred from the release URL domain"
173
+ display_name: "Publisher"
174
+ mergeability: not_mergeable
175
+ domain_flavors: ["cew_series"]
176
+ examples: ["U.S. Bureau of Labor Statistics"]
177
+ passive: true
178
+
179
+ # --- Quarterly observation properties on cew_series ---
180
+
181
+ - name: "establishment_count"
182
+ type: float
183
+ description: "Count of QCEW-covered establishments (physical locations of employers) reported by BLS for the series during the quarter."
184
+ display_name: "QCEW Establishment Count"
185
+ mergeability: not_mergeable
186
+ domain_flavors: ["cew_series"]
187
+ examples: [11907855.0, 61375.0]
188
+ passive: true
189
+
190
+ - name: "monthly_employment_m1"
191
+ type: float
192
+ description: "Number of QCEW-covered employees on payrolls in the first month of the quarter (BLS uses the pay period including the 12th of the month)."
193
+ display_name: "QCEW Employment (Month 1)"
194
+ mergeability: not_mergeable
195
+ domain_flavors: ["cew_series"]
196
+ examples: [152393725.0]
197
+ passive: true
198
+
199
+ - name: "monthly_employment_m2"
200
+ type: float
201
+ description: "Number of QCEW-covered employees on payrolls in the second month of the quarter (pay period including the 12th)."
202
+ display_name: "QCEW Employment (Month 2)"
203
+ mergeability: not_mergeable
204
+ domain_flavors: ["cew_series"]
205
+ examples: [153129544.0]
206
+ passive: true
207
+
208
+ - name: "monthly_employment_m3"
209
+ type: float
210
+ description: "Number of QCEW-covered employees on payrolls in the third month of the quarter (pay period including the 12th)."
211
+ display_name: "QCEW Employment (Month 3)"
212
+ mergeability: not_mergeable
213
+ domain_flavors: ["cew_series"]
214
+ examples: [153848430.0]
215
+ passive: true
216
+
217
+ - name: "employment_level"
218
+ type: float
219
+ description: "Representative quarterly employment level for the series — taken as the third-month employment, matching the convention BLS uses when publishing single-number QCEW employment figures."
220
+ display_name: "QCEW Quarterly Employment Level"
221
+ mergeability: not_mergeable
222
+ domain_flavors: ["cew_series"]
223
+ examples: [153848430.0]
224
+ passive: true
225
+
226
+ - name: "total_quarterly_wages"
227
+ type: float
228
+ description: "Total wages paid (in current US dollars) to all QCEW-covered workers for the series during the quarter."
229
+ display_name: "QCEW Total Quarterly Wages (USD)"
230
+ mergeability: not_mergeable
231
+ domain_flavors: ["cew_series"]
232
+ examples: [3037790324790.0]
233
+ passive: true
234
+
235
+ - name: "taxable_quarterly_wages"
236
+ type: float
237
+ description: "Portion of total quarterly wages (in current US dollars) subject to UI tax contributions. Always 0 for federal-government employment, which is not subject to UI tax."
238
+ display_name: "QCEW Taxable Quarterly Wages (USD)"
239
+ mergeability: not_mergeable
240
+ domain_flavors: ["cew_series"]
241
+ examples: [1151875077520.0]
242
+ passive: true
243
+
244
+ - name: "quarterly_contributions"
245
+ type: float
246
+ description: "Total UI tax contributions (in current US dollars) associated with the series' employment during the quarter."
247
+ display_name: "QCEW Quarterly UI Contributions (USD)"
248
+ mergeability: not_mergeable
249
+ domain_flavors: ["cew_series"]
250
+ examples: [19555346530.0]
251
+ passive: true
252
+
253
+ - name: "avg_weekly_wage"
254
+ type: float
255
+ description: "Average weekly wage (in current US dollars) per QCEW-covered employee for the series during the quarter, computed by BLS as total_qtrly_wages / (avg_emplvl × 13)."
256
+ display_name: "QCEW Average Weekly Wage (USD)"
257
+ mergeability: not_mergeable
258
+ domain_flavors: ["cew_series"]
259
+ examples: [1526.0]
260
+ passive: true
261
+
262
+ # --- BLS-computed year-over-year percent change properties on cew_series ---
263
+
264
+ - name: "employment_yoy_pct_chg"
265
+ type: float
266
+ description: "Year-over-year percent change in third-month employment for the series, computed by BLS by comparing the current quarter against the same quarter of the prior year."
267
+ display_name: "QCEW Employment YoY %"
268
+ mergeability: not_mergeable
269
+ domain_flavors: ["cew_series"]
270
+ examples: [1.5, -0.4]
271
+ passive: true
272
+
273
+ - name: "monthly_employment_m1_yoy_pct_chg"
274
+ type: float
275
+ description: "Year-over-year percent change in first-month employment for the series."
276
+ display_name: "QCEW Employment (Month 1) YoY %"
277
+ mergeability: not_mergeable
278
+ domain_flavors: ["cew_series"]
279
+ examples: [1.4]
280
+ passive: true
281
+
282
+ - name: "monthly_employment_m2_yoy_pct_chg"
283
+ type: float
284
+ description: "Year-over-year percent change in second-month employment for the series."
285
+ display_name: "QCEW Employment (Month 2) YoY %"
286
+ mergeability: not_mergeable
287
+ domain_flavors: ["cew_series"]
288
+ examples: [1.4]
289
+ passive: true
290
+
291
+ - name: "establishments_yoy_pct_chg"
292
+ type: float
293
+ description: "Year-over-year percent change in establishment count for the series."
294
+ display_name: "QCEW Establishments YoY %"
295
+ mergeability: not_mergeable
296
+ domain_flavors: ["cew_series"]
297
+ examples: [1.2]
298
+ passive: true
299
+
300
+ - name: "total_quarterly_wages_yoy_pct_chg"
301
+ type: float
302
+ description: "Year-over-year percent change in total quarterly wages for the series."
303
+ display_name: "QCEW Total Wages YoY %"
304
+ mergeability: not_mergeable
305
+ domain_flavors: ["cew_series"]
306
+ examples: [5.7]
307
+ passive: true
308
+
309
+ - name: "avg_weekly_wage_yoy_pct_chg"
310
+ type: float
311
+ description: "Year-over-year percent change in average weekly wage for the series."
312
+ display_name: "QCEW Avg Weekly Wage YoY %"
313
+ mergeability: not_mergeable
314
+ domain_flavors: ["cew_series"]
315
+ examples: [4.2]
316
+ passive: true
317
+
318
+ - name: "taxable_quarterly_wages_yoy_pct_chg"
319
+ type: float
320
+ description: "Year-over-year percent change in taxable quarterly wages for the series."
321
+ display_name: "QCEW Taxable Wages YoY %"
322
+ mergeability: not_mergeable
323
+ domain_flavors: ["cew_series"]
324
+ examples: [3.3]
325
+ passive: true
326
+
327
+ - name: "quarterly_contributions_yoy_pct_chg"
328
+ type: float
329
+ description: "Year-over-year percent change in quarterly UI contributions for the series."
330
+ display_name: "QCEW UI Contributions YoY %"
331
+ mergeability: not_mergeable
332
+ domain_flavors: ["cew_series"]
333
+ examples: [4.3]
334
+ passive: true
335
+
336
+ relationships:
337
+ - name: "appears_in_cew_series"
338
+ description: "Links a real-world contextual entity (a US geographic area or a NAICS industry) to a QCEW time series whose area or industry dimension is that entity."
339
+ display_name: "Appears in QCEW Series"
340
+ mergeability: not_mergeable
341
+ domain_flavors: ["location", "industry"]
342
+ target_flavors: ["cew_series"]
343
+ passive: true
344
+
345
+ attributes: []
@@ -0,0 +1,135 @@
1
+ # Data Dictionary: FIPS Codes (FCC mirror)
2
+
3
+ ## Source Overview
4
+
5
+ Federal Information Processing Standard (FIPS) codes for U.S. states and
6
+ counties — short numeric identifiers issued by the federal government
7
+ (Census Bureau / NIST historically) that uniquely tag every U.S. state,
8
+ the District of Columbia, and every county-or-equivalent (boroughs,
9
+ parishes, independent cities, census areas).
10
+
11
+ - **Publisher (mirror):** U.S. Federal Communications Commission, Office
12
+ of Engineering and Technology
13
+ - **URL:** https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt
14
+ - **Format:** Plain-text fixed-column ASCII, ~3,200 lines total
15
+ - **Cadence:** Effectively static. The federal codes change only when
16
+ jurisdictions are created, dissolved, or renamed (rare — once every
17
+ several years).
18
+ - **Source name:** `fips`
19
+
20
+ The file contains two sections:
21
+
22
+ 1. A 51-row table of **state-level FIPS codes** (50 states + DC), one
23
+ line per state with a 2-digit code.
24
+ 2. A 3,140+ row table of **county-level FIPS codes**, one line per county
25
+ (or county-equivalent) with a 5-digit code. The leading 2 digits are
26
+ the parent state's FIPS code; the trailing 3 digits identify the
27
+ county within the state. The county-level table is grouped by state
28
+ and prefaced with a header row of the form `XX000 StateName`.
29
+
30
+ **Limitations:**
31
+ - The FCC mirror is a republication of the older Census-published list;
32
+ it does not include FIPS *places* (cities/towns), MSAs, or U.S.
33
+ territories (Puerto Rico, Guam, U.S. Virgin Islands, etc.).
34
+ - A small number of county lines have parenthesized historical
35
+ annotations (`(created after 1990)`, `(1990 Census Area)`,
36
+ `(After 1990, part of Halifax County)`). These are stripped from the
37
+ emitted county name.
38
+ - The `XX000` state header rows in the county section are duplicates of
39
+ the state-level table and are skipped during atomization (one record
40
+ per state, not two).
41
+
42
+ ---
43
+
44
+ ## Entity Types
45
+
46
+ ### `location`
47
+
48
+ Used for both U.S. states (and DC) and U.S. counties (and county-
49
+ equivalents like Alaska boroughs, Louisiana parishes, and Virginia
50
+ independent cities). The level of geography is distinguished by which
51
+ strong-ID property is set (`fips_state` for state-level, `fips_county`
52
+ for county-level) and by the `administrative_level` property.
53
+
54
+ - **Primary key (state-level):** the 2-digit FIPS state code, exposed as
55
+ the `fips_state` strong-ID property and as a property atom.
56
+ - **Primary key (county-level):** the 5-digit FIPS county code, exposed
57
+ as the `fips_county` strong-ID property and as a property atom.
58
+ - **Entity resolver:** named entity, **MERGEABLE**. State and county FIPS
59
+ codes are stable, official identifiers and merging across sources
60
+ (Census, FRED, sanctions data, etc.) is desired. Disambiguation
61
+ snippets include the formatted name (e.g. `"Autauga County, Alabama"`).
62
+ - **Name format:**
63
+ - State-level: `Title-cased state name` (e.g. `"Alabama"`,
64
+ `"District of Columbia"`).
65
+ - County-level: `"{County} County, {State}"` for the common case;
66
+ parishes / boroughs / cities / census areas keep their original
67
+ suffix (e.g. `"East Baton Rouge Parish, Louisiana"`,
68
+ `"Aleutians East Borough, Alaska"`,
69
+ `"Baltimore city, Maryland"`).
70
+
71
+ ---
72
+
73
+ ## Properties
74
+
75
+ The dataset uses the DataSchema `namespace: fips`. Atom property keys
76
+ are `fips::<local_name>` for source-specific properties. Identity
77
+ properties also used for resolver strong IDs are `fips_state` and
78
+ `fips_county`.
79
+
80
+ ### Common Properties (states and counties)
81
+
82
+ * `fips::administrative_level`
83
+ * Definition: Granularity of the geographic entity within the U.S.
84
+ federal hierarchy.
85
+ * Examples: `"state"`, `"county"`
86
+ * Derivation: Set to `"state"` for entries from the state-level table,
87
+ `"county"` for entries from the county-level table.
88
+
89
+ * `fips::official_name`
90
+ * Definition: Verbatim place name as it appears in the FCC mirror,
91
+ upper-cased for states and mixed-case for counties.
92
+ * Examples: `"ALABAMA"`, `"Autauga County"`,
93
+ `"Aleutians East Borough"`, `"East Baton Rouge Parish"`,
94
+ `"Baltimore city"`
95
+ * Derivation: The "place name" column of the source file, with the
96
+ parenthesized historical annotation (when present) stripped.
97
+
98
+ ### State Properties
99
+
100
+ * `fips_state`
101
+ * Definition: Two-digit Federal Information Processing Standard code
102
+ that uniquely identifies a U.S. state or the District of Columbia.
103
+ * Examples: `"01"` (Alabama), `"06"` (California), `"11"` (DC)
104
+ * Derivation: Verbatim from the state-level table's first column,
105
+ zero-padded to two digits.
106
+ * Note: Also used as the strong ID on the state's `location` entity.
107
+
108
+ ### County Properties
109
+
110
+ * `fips_county`
111
+ * Definition: Five-digit Federal Information Processing Standard code
112
+ that uniquely identifies a U.S. county or county-equivalent. The
113
+ leading two digits are the parent state's `fips_state` code; the
114
+ trailing three digits identify the county within the state.
115
+ * Examples: `"01001"` (Autauga County, Alabama), `"06037"` (Los
116
+ Angeles County, California), `"22033"` (East Baton Rouge Parish,
117
+ Louisiana)
118
+ * Derivation: Verbatim from the county-level table's first column,
119
+ zero-padded to five digits.
120
+ * Note: Also used as the strong ID on the county's `location` entity.
121
+
122
+ ---
123
+
124
+ ## Entity Relationships Summary
125
+
126
+ ```
127
+ location (county) ──[located_in]──→ location (state)
128
+ ```
129
+
130
+ - `located_in`: Each county-level `location` is linked to its parent
131
+ state-level `location` via the leading two digits of the county FIPS
132
+ code. Both sides carry strong IDs (`fips_county` and `fips_state`),
133
+ which guarantees resolver merging into a single state node across all
134
+ county-→-state edges and across other datasets that emit the same
135
+ state codes.
@@ -0,0 +1,77 @@
1
+ # Dataset schema for the FIPS Codes (Federal Information Processing
2
+ # Standard) source — the FCC mirror of the Census Bureau's two-digit
3
+ # state and five-digit county FIPS code list.
4
+ #
5
+ # Source: https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt
6
+ # Cadence: effectively static (codes change once every several years).
7
+ #
8
+ # This schema describes U.S. states and counties as `location` entities
9
+ # identified by their FIPS strong IDs, along with a containment
10
+ # relationship from each county to its parent state.
11
+ name: "fips"
12
+ description: "U.S. state and county Federal Information Processing Standard (FIPS) codes from the FCC mirror of the Census Bureau list, modelled as `location` entities with FIPS strong IDs"
13
+
14
+ extraction:
15
+ flavors: closed
16
+ properties: closed
17
+ relationships: closed
18
+ attributes: closed
19
+ events: closed
20
+
21
+ flavors:
22
+ - name: "location"
23
+ description: "A specific named geographic location such as a city, country, region, or landmark"
24
+ display_name: "Location"
25
+ mergeability: not_mergeable
26
+ strong_id_properties: ["fips_state", "fips_county"]
27
+ examples: ["New York City", "San Francisco", "North America", "Bakery Square"]
28
+ passive: true
29
+
30
+ properties:
31
+ - name: "fips_state"
32
+ type: string
33
+ description: "Two-digit Federal Information Processing Standard code that uniquely identifies a U.S. state or the District of Columbia"
34
+ display_name: "State FIPS Code"
35
+ mergeability: not_mergeable
36
+ domain_flavors: ["location"]
37
+ examples: ["01", "06", "11", "36"]
38
+ passive: true
39
+
40
+ - name: "fips_county"
41
+ type: string
42
+ description: "Five-digit Federal Information Processing Standard code that uniquely identifies a U.S. county or county-equivalent; the leading two digits are the parent state's FIPS code"
43
+ display_name: "County FIPS Code"
44
+ mergeability: not_mergeable
45
+ domain_flavors: ["location"]
46
+ examples: ["01001", "06037", "22033", "51790"]
47
+ passive: true
48
+
49
+ - name: "administrative_level"
50
+ namespace: "fips"
51
+ type: string
52
+ description: "Granularity of the geographic entity within the U.S. federal hierarchy of states and counties"
53
+ display_name: "Administrative Level"
54
+ mergeability: not_mergeable
55
+ domain_flavors: ["location"]
56
+ examples: ["state", "county"]
57
+ passive: true
58
+
59
+ - name: "official_name"
60
+ namespace: "fips"
61
+ type: string
62
+ description: "Verbatim place name as published in the FCC mirror of the FIPS code list, with parenthesized historical annotations stripped"
63
+ display_name: "Official FIPS Name"
64
+ mergeability: not_mergeable
65
+ domain_flavors: ["location"]
66
+ examples: ["ALABAMA", "Autauga County", "Aleutians East Borough", "East Baton Rouge Parish"]
67
+ passive: true
68
+
69
+ relationships:
70
+ - name: "located_in"
71
+ description: "Administrative territory or location the entity is situated in (Wikidata P131, P276)"
72
+ display_name: "Located In"
73
+ mergeability: not_mergeable
74
+ domain_flavors: ["location"]
75
+ target_flavors: ["location"]
76
+ examples: ["Autauga County is located in Alabama", "Los Angeles County is located in California"]
77
+ passive: true