@yottagraph-app/data-model-skill 0.0.34 → 0.0.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skill/fjcidb/DATA_DICTIONARY.md +178 -0
- package/skill/fjcidb/schema.yaml +206 -0
package/package.json
CHANGED
|
@@ -0,0 +1,178 @@
|
|
|
1
|
+
# Data Dictionary: FJC IDB (Federal Judicial Center — Civil Cases)
|
|
2
|
+
|
|
3
|
+
Last updated: 2026-05-20
|
|
4
|
+
|
|
5
|
+
## Source Overview
|
|
6
|
+
|
|
7
|
+
The **Federal Judicial Center Integrated Database (IDB)** provides administrative statistics on civil cases filed in U.S. district courts. Lovelace ingests the **civil SAS extract** published on the FJC site (annual FY files such as `cv26.sas7bdat`), not PACER dockets. Each SAS row is one civil case with coded fields for court location, docket number, party labels, nature of suit, and disposition.
|
|
8
|
+
|
|
9
|
+
The streamer uses a **diffing** pipeline: it materializes the SAS file (HTTP download or local path), parses rows in chunks, stores a normalized JSON snapshot per case under the raw store (`fjcidb/download/{case-id}.json`), and republishes atom batches only when a row’s JSON changes.
|
|
10
|
+
|
|
11
|
+
| Item | Value |
|
|
12
|
+
|------|--------|
|
|
13
|
+
| Stream source constant | `fjcidb-source` |
|
|
14
|
+
| `Record.Source` | `fjcidb` |
|
|
15
|
+
| Default dataset URL | FJC FY civil SAS (see FJC civil cases landing page) |
|
|
16
|
+
| Poll cadence | Configurable (`pollTimeMin`); typical dev runs use a large interval or one-shot via fetcheval |
|
|
17
|
+
|
|
18
|
+
**Data quality notes**
|
|
19
|
+
|
|
20
|
+
- Party fields **PLT** and **DEF** are short text labels (often truncated). They are not full party rosters; numeric values mean “count of plaintiffs/defendants” rather than a name.
|
|
21
|
+
- Person vs organization for textual parties uses **regex heuristics** on the label (see `party_regex.go`), not LLM classification in the default ingest path.
|
|
22
|
+
- Code values (district, disposition, etc.) follow the FJC codebook; this dictionary documents KG mapping, not codebook semantics.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Entity Types
|
|
27
|
+
|
|
28
|
+
### `legal_case`
|
|
29
|
+
|
|
30
|
+
One civil case in the IDB extract, identified by district, office, docket, and filing year.
|
|
31
|
+
|
|
32
|
+
- **Subject name:** Human-readable label including `fjcidb_case_id`.
|
|
33
|
+
- **Strong id:** `fjcidb_case_id` on the case subject.
|
|
34
|
+
- **Resolver:** `NOT_MERGEABLE` — passive administrative case node.
|
|
35
|
+
- **Timestamp:** Atomization time (microseconds) for the ingest pass.
|
|
36
|
+
|
|
37
|
+
### `person`
|
|
38
|
+
|
|
39
|
+
An individual named on the plaintiff or defendant side when the IDB field is **textual** (not digits-only) and classified as a person by regex rules.
|
|
40
|
+
|
|
41
|
+
- **Subject name:** Normalized party display (trailing `, ET AL` removed from the raw field).
|
|
42
|
+
- **Property:** `name` (normalized party label from PLT/DEF).
|
|
43
|
+
- **Resolver:** `MERGEABLE` named entity; lawsuit-context snippet on the record (who sues whom; NOS phrase included).
|
|
44
|
+
- **Examples:** `SHANKS`, `BECERRA`, `BIDEN` (with or without `, ET AL`).
|
|
45
|
+
|
|
46
|
+
### `organization`
|
|
47
|
+
|
|
48
|
+
An institution or collective named on the plaintiff or defendant side when the field is textual and classified as organization (regex), or when classification is ambiguous and defaults to organization.
|
|
49
|
+
|
|
50
|
+
- **Subject name:** Same normalization as person parties.
|
|
51
|
+
- **Property:** `name` (normalized party label from PLT/DEF).
|
|
52
|
+
- **Resolver:** `MERGEABLE` named entity; same lawsuit snippet pattern as person parties.
|
|
53
|
+
- **Examples:** `DEPARTMENT OF DEFENSE`, `CUMMINS INC.`, `INTERNATIONAL UNION OF , ET AL`.
|
|
54
|
+
|
|
55
|
+
### `nature_of_suit`
|
|
56
|
+
|
|
57
|
+
A federal civil **nature of suit (NOS)** code from the U.S. Courts classification, linked from the case’s **NOS** field.
|
|
58
|
+
|
|
59
|
+
- **Subject name:** Short title when known, e.g. `Employment`; otherwise `Nature of suit {code} (federal civil)`.
|
|
60
|
+
- **Strong id:** `nos_code`.
|
|
61
|
+
- **Resolver:** `NOT_MERGEABLE`.
|
|
62
|
+
- **Reference:** U.S. Courts civil NOS code descriptions PDF (titles/descriptions embedded in ingest).
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Properties
|
|
67
|
+
|
|
68
|
+
### Legal case
|
|
69
|
+
|
|
70
|
+
* `fjcidb_case_id`
|
|
71
|
+
* Definition: Stable case identifier for this IDB row.
|
|
72
|
+
* Examples: `3-1-12345-2025`, `90-1-2303817-2023`
|
|
73
|
+
* Derivation: `{DISTRICT}-{OFFICE}-{DOCKET}-{year}` where year comes from `FILEDATE`, or from `FDATEUSE` when `FILEDATE` is absent. Row omitted if year cannot be determined.
|
|
74
|
+
|
|
75
|
+
* `district_code`
|
|
76
|
+
* Definition: FJC district court code (`DISTRICT`).
|
|
77
|
+
* Examples: `3`, `90`
|
|
78
|
+
* Derivation: SAS `DISTRICT`.
|
|
79
|
+
|
|
80
|
+
* `office_code`
|
|
81
|
+
* Definition: FJC office within the district (`OFFICE`).
|
|
82
|
+
* Examples: `1`
|
|
83
|
+
* Derivation: SAS `OFFICE`.
|
|
84
|
+
|
|
85
|
+
* `case_docket_number`
|
|
86
|
+
* Definition: Court docket number (`DOCKET`).
|
|
87
|
+
* Examples: `12345`, `2303817`
|
|
88
|
+
* Derivation: SAS `DOCKET`.
|
|
89
|
+
|
|
90
|
+
* `case_filing_date`
|
|
91
|
+
* Definition: Filing date as stored in the extract.
|
|
92
|
+
* Derivation: SAS `FILEDATE`.
|
|
93
|
+
|
|
94
|
+
* `termination_date`
|
|
95
|
+
* Definition: Termination date when present.
|
|
96
|
+
* Derivation: SAS `TERMDATE`.
|
|
97
|
+
|
|
98
|
+
* `origin_code`, `jurisdiction_code`, `disposition_code`, `class_action_code`, `procedural_progress_code`
|
|
99
|
+
* Definition: FJC codebook fields for procedural status.
|
|
100
|
+
* Derivation: SAS `ORIGIN`, `JURIS`, `DISP`, `CLASSACT`, `PROCPROG` when non-empty.
|
|
101
|
+
|
|
102
|
+
### Person and organization (shared)
|
|
103
|
+
|
|
104
|
+
* `name`
|
|
105
|
+
* Definition: Normalized party label from PLT or DEF.
|
|
106
|
+
* Examples: `SHANKS`, `INTERNATIONAL UNION OF`
|
|
107
|
+
* Derivation: Trim and remove trailing `, ET AL` from the raw IDB field; matches the record subject name.
|
|
108
|
+
|
|
109
|
+
### Nature of suit
|
|
110
|
+
|
|
111
|
+
* `nos_code`
|
|
112
|
+
* Definition: Numeric NOS code from the case row.
|
|
113
|
+
* Examples: `110`, `442`
|
|
114
|
+
* Derivation: SAS `NOS`.
|
|
115
|
+
|
|
116
|
+
* `nos_title`
|
|
117
|
+
* Definition: Short title from the U.S. Courts NOS codebook.
|
|
118
|
+
* Examples: `Insurance`, `Employment`
|
|
119
|
+
* Derivation: Lookup table when code is known; omitted if unknown.
|
|
120
|
+
|
|
121
|
+
* `nos_description`
|
|
122
|
+
* Definition: Long description from the same codebook.
|
|
123
|
+
* Derivation: Lookup table when code is known; omitted if unknown.
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## Entity Relationships
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
legal_case ──[has_nature_of_suit]──→ nature_of_suit
|
|
131
|
+
|
|
132
|
+
person ──[is_plaintiff_in]────→ legal_case
|
|
133
|
+
organization ──[is_plaintiff_in]────→ legal_case
|
|
134
|
+
|
|
135
|
+
person ──[is_defendant_in]────→ legal_case
|
|
136
|
+
organization ──[is_defendant_in]────→ legal_case
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
- **`is_plaintiff_in` / `is_defendant_in`:** Emitted on the **party** record; target atom points at the case entity (with case strong id on the target for graph linkage). Only for non-empty, non-numeric, non-`SEALED` PLT/DEF values.
|
|
140
|
+
- **`has_nature_of_suit`:** Emitted on the **case** record when `NOS` is present; target is the `nature_of_suit` entity for that code.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Records Per Case
|
|
145
|
+
|
|
146
|
+
Typical atomization for one row with textual PLT and DEF, and a NOS code:
|
|
147
|
+
|
|
148
|
+
1. One `legal_case` record (case properties + `has_nature_of_suit` target).
|
|
149
|
+
2. One `nature_of_suit` record (when NOS present).
|
|
150
|
+
3. Up to two party records (`person` or `organization` per side).
|
|
151
|
+
|
|
152
|
+
Rows with only numeric PLT/DEF produce a case record with count properties only (no party entities). `SEALED` parties produce no party entities.
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Party labeling (person vs organization)
|
|
157
|
+
|
|
158
|
+
Textual PLT/DEF labels are classified before flavor assignment:
|
|
159
|
+
|
|
160
|
+
1. **Skip** — empty, all digits, or withheld (`SEALED`).
|
|
161
|
+
2. **Organization** — label matches org indicators (legal suffixes, government words, `OF`/`AND`/`THE`, commas, truncated org stems) or has **three or more** name tokens after normalization.
|
|
162
|
+
3. **Person** — one or two tokens that look like personal name parts (letters, hyphen, apostrophe).
|
|
163
|
+
4. **Default** — organization.
|
|
164
|
+
|
|
165
|
+
Consumers should treat regex labels as heuristic, especially on truncated government and corporate strings.
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## Citations
|
|
170
|
+
|
|
171
|
+
Citation text on atoms and entities: `Federal Judicial Center Integrated Database, civil SAS extract` with link to the FJC IDB research page. NOS entities additionally cite the U.S. Courts NOS descriptions document.
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
## Validation
|
|
176
|
+
|
|
177
|
+
- **Unit fixture:** `testdata/sample_fjcidb_case.pb.txt` (SHANKS vs International Union, NOS 442) — regenerate via `./run_legal_test.sh` at repo root.
|
|
178
|
+
- **recordeval:** Same script runs schema validation on `testdata/*.pb.txt`.
|
|
@@ -0,0 +1,206 @@
|
|
|
1
|
+
# Dataset schema for FJC Integrated Database — U.S. district court civil cases (SAS extracts).
|
|
2
|
+
#
|
|
3
|
+
# Source: https://www.fjc.gov/research/idb/civil-cases-filed-terminated-and-pending-sy-1988-present
|
|
4
|
+
# Codebook: Civil Codebook 1988 Forward (FJC PDF).
|
|
5
|
+
#
|
|
6
|
+
# Default ingest target is the FY civil SAS file (e.g. cv26.sas7bdat), refreshed by FJC on a schedule.
|
|
7
|
+
name: "fjcidb"
|
|
8
|
+
description: "Federal Judicial Center Integrated Database civil case records for U.S. district courts, including office and docket identity, party labels (paintif/defendant), filing and termination timing, jurisdiction, nature-of-suit entities linked from each case, and disposition codes"
|
|
9
|
+
|
|
10
|
+
extraction:
|
|
11
|
+
flavors: closed
|
|
12
|
+
properties: closed
|
|
13
|
+
relationships: closed
|
|
14
|
+
attributes: closed
|
|
15
|
+
events: closed
|
|
16
|
+
|
|
17
|
+
flavors:
|
|
18
|
+
- name: "legal_case"
|
|
19
|
+
description: "A civil case docketed in a U.S. federal district court, identified by FJC administrative codes rather than PACER"
|
|
20
|
+
display_name: "Legal case"
|
|
21
|
+
mergeability: not_mergeable
|
|
22
|
+
strong_id_properties: ["fjcidb_case_id"]
|
|
23
|
+
passive: true
|
|
24
|
+
|
|
25
|
+
- name: "organization"
|
|
26
|
+
description: "A particular business, institution, or organization such as a corporation, university, government agency, or non-profit"
|
|
27
|
+
display_name: "Organization"
|
|
28
|
+
mergeability: mergeable
|
|
29
|
+
passive: true
|
|
30
|
+
|
|
31
|
+
- name: "person"
|
|
32
|
+
description: "A real person as opposed to a fictional character, such as a CEO, politician, or public figure"
|
|
33
|
+
display_name: "Person"
|
|
34
|
+
mergeability: mergeable
|
|
35
|
+
passive: true
|
|
36
|
+
|
|
37
|
+
- name: "nature_of_suit"
|
|
38
|
+
description: "Federal civil nature-of-suit (NOS) code from the U.S. Courts classification"
|
|
39
|
+
display_name: "Nature of suit"
|
|
40
|
+
mergeability: not_mergeable
|
|
41
|
+
strong_id_properties: ["nos_code"]
|
|
42
|
+
passive: true
|
|
43
|
+
|
|
44
|
+
properties:
|
|
45
|
+
- name: "fjcidb_case_id"
|
|
46
|
+
namespace: "fjcidb"
|
|
47
|
+
type: string
|
|
48
|
+
description: "Stable FJC civil case identifier formed from district, office, docket, and filing year"
|
|
49
|
+
display_name: "FJC civil case ID"
|
|
50
|
+
mergeability: not_mergeable
|
|
51
|
+
domain_flavors: ["legal_case"]
|
|
52
|
+
passive: true
|
|
53
|
+
|
|
54
|
+
- name: "name"
|
|
55
|
+
namespace: "fjcidb"
|
|
56
|
+
type: string
|
|
57
|
+
description: "Display name of the entity"
|
|
58
|
+
display_name: "Name"
|
|
59
|
+
mergeability: not_mergeable
|
|
60
|
+
domain_flavors: ["organization", "person"]
|
|
61
|
+
passive: true
|
|
62
|
+
|
|
63
|
+
- name: "nos_code"
|
|
64
|
+
namespace: "fjcidb"
|
|
65
|
+
type: string
|
|
66
|
+
description: "Nature of suit numeric code (matches IDB NOS and the U.S. Courts civil NOS codebook)"
|
|
67
|
+
display_name: "NOS code"
|
|
68
|
+
mergeability: not_mergeable
|
|
69
|
+
domain_flavors: ["nature_of_suit"]
|
|
70
|
+
passive: true
|
|
71
|
+
|
|
72
|
+
- name: "nos_title"
|
|
73
|
+
namespace: "fjcidb"
|
|
74
|
+
type: string
|
|
75
|
+
description: "Short title for this nature of suit code from the U.S. Courts civil NOS descriptions"
|
|
76
|
+
display_name: "Nature of suit title"
|
|
77
|
+
mergeability: not_mergeable
|
|
78
|
+
domain_flavors: ["nature_of_suit"]
|
|
79
|
+
passive: true
|
|
80
|
+
|
|
81
|
+
- name: "nos_description"
|
|
82
|
+
namespace: "fjcidb"
|
|
83
|
+
type: string
|
|
84
|
+
description: "Official description for this nature of suit code from the U.S. Courts civil NOS descriptions"
|
|
85
|
+
display_name: "Nature of suit description"
|
|
86
|
+
mergeability: not_mergeable
|
|
87
|
+
domain_flavors: ["nature_of_suit"]
|
|
88
|
+
passive: true
|
|
89
|
+
|
|
90
|
+
- name: "district_code"
|
|
91
|
+
namespace: "fjcidb"
|
|
92
|
+
type: string
|
|
93
|
+
description: "FJC district court code for where the civil action was filed (IDB DISTRICT)"
|
|
94
|
+
display_name: "FJC district code"
|
|
95
|
+
mergeability: not_mergeable
|
|
96
|
+
domain_flavors: ["legal_case"]
|
|
97
|
+
passive: true
|
|
98
|
+
|
|
99
|
+
- name: "office_code"
|
|
100
|
+
namespace: "fjcidb"
|
|
101
|
+
type: string
|
|
102
|
+
description: "FJC office code within the district where the case was filed (IDB OFFICE)"
|
|
103
|
+
display_name: "FJC office code"
|
|
104
|
+
mergeability: not_mergeable
|
|
105
|
+
domain_flavors: ["legal_case"]
|
|
106
|
+
passive: true
|
|
107
|
+
|
|
108
|
+
- name: "case_docket_number"
|
|
109
|
+
namespace: "fjcidb"
|
|
110
|
+
type: string
|
|
111
|
+
description: "Docket number assigned by the filing office (IDB DOCKET)"
|
|
112
|
+
display_name: "Docket number"
|
|
113
|
+
mergeability: not_mergeable
|
|
114
|
+
domain_flavors: ["legal_case"]
|
|
115
|
+
passive: true
|
|
116
|
+
|
|
117
|
+
- name: "case_filing_date"
|
|
118
|
+
namespace: "fjcidb"
|
|
119
|
+
type: string
|
|
120
|
+
description: "Case filing date"
|
|
121
|
+
display_name: "Filing date"
|
|
122
|
+
mergeability: not_mergeable
|
|
123
|
+
domain_flavors: ["legal_case"]
|
|
124
|
+
passive: true
|
|
125
|
+
|
|
126
|
+
- name: "termination_date"
|
|
127
|
+
namespace: "fjcidb"
|
|
128
|
+
type: string
|
|
129
|
+
description: "Termination date if the case terminated in the reporting window"
|
|
130
|
+
display_name: "Termination date"
|
|
131
|
+
mergeability: not_mergeable
|
|
132
|
+
domain_flavors: ["legal_case"]
|
|
133
|
+
passive: true
|
|
134
|
+
|
|
135
|
+
- name: "origin_code"
|
|
136
|
+
namespace: "fjcidb"
|
|
137
|
+
type: string
|
|
138
|
+
description: "Procedural origin code for the filing"
|
|
139
|
+
display_name: "Origin code"
|
|
140
|
+
mergeability: not_mergeable
|
|
141
|
+
domain_flavors: ["legal_case"]
|
|
142
|
+
passive: true
|
|
143
|
+
|
|
144
|
+
- name: "jurisdiction_code"
|
|
145
|
+
namespace: "fjcidb"
|
|
146
|
+
type: string
|
|
147
|
+
description: "Jurisdiction code"
|
|
148
|
+
display_name: "Jurisdiction code"
|
|
149
|
+
mergeability: not_mergeable
|
|
150
|
+
domain_flavors: ["legal_case"]
|
|
151
|
+
passive: true
|
|
152
|
+
|
|
153
|
+
- name: "disposition_code"
|
|
154
|
+
namespace: "fjcidb"
|
|
155
|
+
type: string
|
|
156
|
+
description: "Disposition code when the case terminated"
|
|
157
|
+
display_name: "Disposition code"
|
|
158
|
+
mergeability: not_mergeable
|
|
159
|
+
domain_flavors: ["legal_case"]
|
|
160
|
+
passive: true
|
|
161
|
+
|
|
162
|
+
- name: "class_action_code"
|
|
163
|
+
namespace: "fjcidb"
|
|
164
|
+
type: string
|
|
165
|
+
description: "Class action designation code"
|
|
166
|
+
display_name: "Class action code"
|
|
167
|
+
mergeability: not_mergeable
|
|
168
|
+
domain_flavors: ["legal_case"]
|
|
169
|
+
passive: true
|
|
170
|
+
|
|
171
|
+
- name: "procedural_progress_code"
|
|
172
|
+
namespace: "fjcidb"
|
|
173
|
+
type: string
|
|
174
|
+
description: "Procedural progress code"
|
|
175
|
+
display_name: "Procedural progress code"
|
|
176
|
+
mergeability: not_mergeable
|
|
177
|
+
domain_flavors: ["legal_case"]
|
|
178
|
+
passive: true
|
|
179
|
+
|
|
180
|
+
relationships:
|
|
181
|
+
- name: "is_plaintiff_in"
|
|
182
|
+
namespace: "fjcidb"
|
|
183
|
+
description: "The party is named as the plaintiff side in this FJC civil case"
|
|
184
|
+
display_name: "Is plaintiff in"
|
|
185
|
+
mergeability: not_mergeable
|
|
186
|
+
domain_flavors: ["organization", "person"]
|
|
187
|
+
target_flavors: ["legal_case"]
|
|
188
|
+
passive: true
|
|
189
|
+
|
|
190
|
+
- name: "is_defendant_in"
|
|
191
|
+
namespace: "fjcidb"
|
|
192
|
+
description: "The party is named as the defendant side in this FJC civil case"
|
|
193
|
+
display_name: "Is defendant in"
|
|
194
|
+
mergeability: not_mergeable
|
|
195
|
+
domain_flavors: ["organization", "person"]
|
|
196
|
+
target_flavors: ["legal_case"]
|
|
197
|
+
passive: true
|
|
198
|
+
|
|
199
|
+
- name: "has_nature_of_suit"
|
|
200
|
+
namespace: "fjcidb"
|
|
201
|
+
description: "The civil case is classified under this federal nature-of-suit code (IDB NOS field)"
|
|
202
|
+
display_name: "Has nature of suit"
|
|
203
|
+
mergeability: not_mergeable
|
|
204
|
+
domain_flavors: ["legal_case"]
|
|
205
|
+
target_flavors: ["nature_of_suit"]
|
|
206
|
+
passive: true
|