@yottagraph-app/data-model-skill 0.0.34 → 0.0.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@yottagraph-app/data-model-skill",
3
- "version": "0.0.34",
3
+ "version": "0.0.35",
4
4
  "description": "Data model skill documentation for AI agents - entity types, properties, and schemas from Lovelace fetch sources",
5
5
  "repository": {
6
6
  "type": "git",
@@ -0,0 +1,178 @@
1
+ # Data Dictionary: FJC IDB (Federal Judicial Center — Civil Cases)
2
+
3
+ Last updated: 2026-05-20
4
+
5
+ ## Source Overview
6
+
7
+ The **Federal Judicial Center Integrated Database (IDB)** provides administrative statistics on civil cases filed in U.S. district courts. Lovelace ingests the **civil SAS extract** published on the FJC site (annual FY files such as `cv26.sas7bdat`), not PACER dockets. Each SAS row is one civil case with coded fields for court location, docket number, party labels, nature of suit, and disposition.
8
+
9
+ The streamer uses a **diffing** pipeline: it materializes the SAS file (HTTP download or local path), parses rows in chunks, stores a normalized JSON snapshot per case under the raw store (`fjcidb/download/{case-id}.json`), and republishes atom batches only when a row’s JSON changes.
10
+
11
+ | Item | Value |
12
+ |------|--------|
13
+ | Stream source constant | `fjcidb-source` |
14
+ | `Record.Source` | `fjcidb` |
15
+ | Default dataset URL | FJC FY civil SAS (see FJC civil cases landing page) |
16
+ | Poll cadence | Configurable (`pollTimeMin`); typical dev runs use a large interval or one-shot via fetcheval |
17
+
18
+ **Data quality notes**
19
+
20
+ - Party fields **PLT** and **DEF** are short text labels (often truncated). They are not full party rosters; numeric values mean “count of plaintiffs/defendants” rather than a name.
21
+ - Person vs organization for textual parties uses **regex heuristics** on the label (see `party_regex.go`), not LLM classification in the default ingest path.
22
+ - Code values (district, disposition, etc.) follow the FJC codebook; this dictionary documents KG mapping, not codebook semantics.
23
+
24
+ ---
25
+
26
+ ## Entity Types
27
+
28
+ ### `legal_case`
29
+
30
+ One civil case in the IDB extract, identified by district, office, docket, and filing year.
31
+
32
+ - **Subject name:** Human-readable label including `fjcidb_case_id`.
33
+ - **Strong id:** `fjcidb_case_id` on the case subject.
34
+ - **Resolver:** `NOT_MERGEABLE` — passive administrative case node.
35
+ - **Timestamp:** Atomization time (microseconds) for the ingest pass.
36
+
37
+ ### `person`
38
+
39
+ An individual named on the plaintiff or defendant side when the IDB field is **textual** (not digits-only) and classified as a person by regex rules.
40
+
41
+ - **Subject name:** Normalized party display (trailing `, ET AL` removed from the raw field).
42
+ - **Property:** `name` (normalized party label from PLT/DEF).
43
+ - **Resolver:** `MERGEABLE` named entity; lawsuit-context snippet on the record (who sues whom; NOS phrase included).
44
+ - **Examples:** `SHANKS`, `BECERRA`, `BIDEN` (with or without `, ET AL`).
45
+
46
+ ### `organization`
47
+
48
+ An institution or collective named on the plaintiff or defendant side when the field is textual and classified as organization (regex), or when classification is ambiguous and defaults to organization.
49
+
50
+ - **Subject name:** Same normalization as person parties.
51
+ - **Property:** `name` (normalized party label from PLT/DEF).
52
+ - **Resolver:** `MERGEABLE` named entity; same lawsuit snippet pattern as person parties.
53
+ - **Examples:** `DEPARTMENT OF DEFENSE`, `CUMMINS INC.`, `INTERNATIONAL UNION OF , ET AL`.
54
+
55
+ ### `nature_of_suit`
56
+
57
+ A federal civil **nature of suit (NOS)** code from the U.S. Courts classification, linked from the case’s **NOS** field.
58
+
59
+ - **Subject name:** Short title when known, e.g. `Employment`; otherwise `Nature of suit {code} (federal civil)`.
60
+ - **Strong id:** `nos_code`.
61
+ - **Resolver:** `NOT_MERGEABLE`.
62
+ - **Reference:** U.S. Courts civil NOS code descriptions PDF (titles/descriptions embedded in ingest).
63
+
64
+ ---
65
+
66
+ ## Properties
67
+
68
+ ### Legal case
69
+
70
+ * `fjcidb_case_id`
71
+ * Definition: Stable case identifier for this IDB row.
72
+ * Examples: `3-1-12345-2025`, `90-1-2303817-2023`
73
+ * Derivation: `{DISTRICT}-{OFFICE}-{DOCKET}-{year}` where year comes from `FILEDATE`, or from `FDATEUSE` when `FILEDATE` is absent. Row omitted if year cannot be determined.
74
+
75
+ * `district_code`
76
+ * Definition: FJC district court code (`DISTRICT`).
77
+ * Examples: `3`, `90`
78
+ * Derivation: SAS `DISTRICT`.
79
+
80
+ * `office_code`
81
+ * Definition: FJC office within the district (`OFFICE`).
82
+ * Examples: `1`
83
+ * Derivation: SAS `OFFICE`.
84
+
85
+ * `case_docket_number`
86
+ * Definition: Court docket number (`DOCKET`).
87
+ * Examples: `12345`, `2303817`
88
+ * Derivation: SAS `DOCKET`.
89
+
90
+ * `case_filing_date`
91
+ * Definition: Filing date as stored in the extract.
92
+ * Derivation: SAS `FILEDATE`.
93
+
94
+ * `termination_date`
95
+ * Definition: Termination date when present.
96
+ * Derivation: SAS `TERMDATE`.
97
+
98
+ * `origin_code`, `jurisdiction_code`, `disposition_code`, `class_action_code`, `procedural_progress_code`
99
+ * Definition: FJC codebook fields for procedural status.
100
+ * Derivation: SAS `ORIGIN`, `JURIS`, `DISP`, `CLASSACT`, `PROCPROG` when non-empty.
101
+
102
+ ### Person and organization (shared)
103
+
104
+ * `name`
105
+ * Definition: Normalized party label from PLT or DEF.
106
+ * Examples: `SHANKS`, `INTERNATIONAL UNION OF`
107
+ * Derivation: Trim and remove trailing `, ET AL` from the raw IDB field; matches the record subject name.
108
+
109
+ ### Nature of suit
110
+
111
+ * `nos_code`
112
+ * Definition: Numeric NOS code from the case row.
113
+ * Examples: `110`, `442`
114
+ * Derivation: SAS `NOS`.
115
+
116
+ * `nos_title`
117
+ * Definition: Short title from the U.S. Courts NOS codebook.
118
+ * Examples: `Insurance`, `Employment`
119
+ * Derivation: Lookup table when code is known; omitted if unknown.
120
+
121
+ * `nos_description`
122
+ * Definition: Long description from the same codebook.
123
+ * Derivation: Lookup table when code is known; omitted if unknown.
124
+
125
+ ---
126
+
127
+ ## Entity Relationships
128
+
129
+ ```
130
+ legal_case ──[has_nature_of_suit]──→ nature_of_suit
131
+
132
+ person ──[is_plaintiff_in]────→ legal_case
133
+ organization ──[is_plaintiff_in]────→ legal_case
134
+
135
+ person ──[is_defendant_in]────→ legal_case
136
+ organization ──[is_defendant_in]────→ legal_case
137
+ ```
138
+
139
+ - **`is_plaintiff_in` / `is_defendant_in`:** Emitted on the **party** record; target atom points at the case entity (with case strong id on the target for graph linkage). Only for non-empty, non-numeric, non-`SEALED` PLT/DEF values.
140
+ - **`has_nature_of_suit`:** Emitted on the **case** record when `NOS` is present; target is the `nature_of_suit` entity for that code.
141
+
142
+ ---
143
+
144
+ ## Records Per Case
145
+
146
+ Typical atomization for one row with textual PLT and DEF, and a NOS code:
147
+
148
+ 1. One `legal_case` record (case properties + `has_nature_of_suit` target).
149
+ 2. One `nature_of_suit` record (when NOS present).
150
+ 3. Up to two party records (`person` or `organization` per side).
151
+
152
+ Rows with only numeric PLT/DEF produce a case record with count properties only (no party entities). `SEALED` parties produce no party entities.
153
+
154
+ ---
155
+
156
+ ## Party labeling (person vs organization)
157
+
158
+ Textual PLT/DEF labels are classified before flavor assignment:
159
+
160
+ 1. **Skip** — empty, all digits, or withheld (`SEALED`).
161
+ 2. **Organization** — label matches org indicators (legal suffixes, government words, `OF`/`AND`/`THE`, commas, truncated org stems) or has **three or more** name tokens after normalization.
162
+ 3. **Person** — one or two tokens that look like personal name parts (letters, hyphen, apostrophe).
163
+ 4. **Default** — organization.
164
+
165
+ Consumers should treat regex labels as heuristic, especially on truncated government and corporate strings.
166
+
167
+ ---
168
+
169
+ ## Citations
170
+
171
+ Citation text on atoms and entities: `Federal Judicial Center Integrated Database, civil SAS extract` with link to the FJC IDB research page. NOS entities additionally cite the U.S. Courts NOS descriptions document.
172
+
173
+ ---
174
+
175
+ ## Validation
176
+
177
+ - **Unit fixture:** `testdata/sample_fjcidb_case.pb.txt` (SHANKS vs International Union, NOS 442) — regenerate via `./run_legal_test.sh` at repo root.
178
+ - **recordeval:** Same script runs schema validation on `testdata/*.pb.txt`.
@@ -0,0 +1,206 @@
1
+ # Dataset schema for FJC Integrated Database — U.S. district court civil cases (SAS extracts).
2
+ #
3
+ # Source: https://www.fjc.gov/research/idb/civil-cases-filed-terminated-and-pending-sy-1988-present
4
+ # Codebook: Civil Codebook 1988 Forward (FJC PDF).
5
+ #
6
+ # Default ingest target is the FY civil SAS file (e.g. cv26.sas7bdat), refreshed by FJC on a schedule.
7
+ name: "fjcidb"
8
+ description: "Federal Judicial Center Integrated Database civil case records for U.S. district courts, including office and docket identity, party labels (paintif/defendant), filing and termination timing, jurisdiction, nature-of-suit entities linked from each case, and disposition codes"
9
+
10
+ extraction:
11
+ flavors: closed
12
+ properties: closed
13
+ relationships: closed
14
+ attributes: closed
15
+ events: closed
16
+
17
+ flavors:
18
+ - name: "legal_case"
19
+ description: "A civil case docketed in a U.S. federal district court, identified by FJC administrative codes rather than PACER"
20
+ display_name: "Legal case"
21
+ mergeability: not_mergeable
22
+ strong_id_properties: ["fjcidb_case_id"]
23
+ passive: true
24
+
25
+ - name: "organization"
26
+ description: "A particular business, institution, or organization such as a corporation, university, government agency, or non-profit"
27
+ display_name: "Organization"
28
+ mergeability: mergeable
29
+ passive: true
30
+
31
+ - name: "person"
32
+ description: "A real person as opposed to a fictional character, such as a CEO, politician, or public figure"
33
+ display_name: "Person"
34
+ mergeability: mergeable
35
+ passive: true
36
+
37
+ - name: "nature_of_suit"
38
+ description: "Federal civil nature-of-suit (NOS) code from the U.S. Courts classification"
39
+ display_name: "Nature of suit"
40
+ mergeability: not_mergeable
41
+ strong_id_properties: ["nos_code"]
42
+ passive: true
43
+
44
+ properties:
45
+ - name: "fjcidb_case_id"
46
+ namespace: "fjcidb"
47
+ type: string
48
+ description: "Stable FJC civil case identifier formed from district, office, docket, and filing year"
49
+ display_name: "FJC civil case ID"
50
+ mergeability: not_mergeable
51
+ domain_flavors: ["legal_case"]
52
+ passive: true
53
+
54
+ - name: "name"
55
+ namespace: "fjcidb"
56
+ type: string
57
+ description: "Display name of the entity"
58
+ display_name: "Name"
59
+ mergeability: not_mergeable
60
+ domain_flavors: ["organization", "person"]
61
+ passive: true
62
+
63
+ - name: "nos_code"
64
+ namespace: "fjcidb"
65
+ type: string
66
+ description: "Nature of suit numeric code (matches IDB NOS and the U.S. Courts civil NOS codebook)"
67
+ display_name: "NOS code"
68
+ mergeability: not_mergeable
69
+ domain_flavors: ["nature_of_suit"]
70
+ passive: true
71
+
72
+ - name: "nos_title"
73
+ namespace: "fjcidb"
74
+ type: string
75
+ description: "Short title for this nature of suit code from the U.S. Courts civil NOS descriptions"
76
+ display_name: "Nature of suit title"
77
+ mergeability: not_mergeable
78
+ domain_flavors: ["nature_of_suit"]
79
+ passive: true
80
+
81
+ - name: "nos_description"
82
+ namespace: "fjcidb"
83
+ type: string
84
+ description: "Official description for this nature of suit code from the U.S. Courts civil NOS descriptions"
85
+ display_name: "Nature of suit description"
86
+ mergeability: not_mergeable
87
+ domain_flavors: ["nature_of_suit"]
88
+ passive: true
89
+
90
+ - name: "district_code"
91
+ namespace: "fjcidb"
92
+ type: string
93
+ description: "FJC district court code for where the civil action was filed (IDB DISTRICT)"
94
+ display_name: "FJC district code"
95
+ mergeability: not_mergeable
96
+ domain_flavors: ["legal_case"]
97
+ passive: true
98
+
99
+ - name: "office_code"
100
+ namespace: "fjcidb"
101
+ type: string
102
+ description: "FJC office code within the district where the case was filed (IDB OFFICE)"
103
+ display_name: "FJC office code"
104
+ mergeability: not_mergeable
105
+ domain_flavors: ["legal_case"]
106
+ passive: true
107
+
108
+ - name: "case_docket_number"
109
+ namespace: "fjcidb"
110
+ type: string
111
+ description: "Docket number assigned by the filing office (IDB DOCKET)"
112
+ display_name: "Docket number"
113
+ mergeability: not_mergeable
114
+ domain_flavors: ["legal_case"]
115
+ passive: true
116
+
117
+ - name: "case_filing_date"
118
+ namespace: "fjcidb"
119
+ type: string
120
+ description: "Case filing date"
121
+ display_name: "Filing date"
122
+ mergeability: not_mergeable
123
+ domain_flavors: ["legal_case"]
124
+ passive: true
125
+
126
+ - name: "termination_date"
127
+ namespace: "fjcidb"
128
+ type: string
129
+ description: "Termination date if the case terminated in the reporting window"
130
+ display_name: "Termination date"
131
+ mergeability: not_mergeable
132
+ domain_flavors: ["legal_case"]
133
+ passive: true
134
+
135
+ - name: "origin_code"
136
+ namespace: "fjcidb"
137
+ type: string
138
+ description: "Procedural origin code for the filing"
139
+ display_name: "Origin code"
140
+ mergeability: not_mergeable
141
+ domain_flavors: ["legal_case"]
142
+ passive: true
143
+
144
+ - name: "jurisdiction_code"
145
+ namespace: "fjcidb"
146
+ type: string
147
+ description: "Jurisdiction code"
148
+ display_name: "Jurisdiction code"
149
+ mergeability: not_mergeable
150
+ domain_flavors: ["legal_case"]
151
+ passive: true
152
+
153
+ - name: "disposition_code"
154
+ namespace: "fjcidb"
155
+ type: string
156
+ description: "Disposition code when the case terminated"
157
+ display_name: "Disposition code"
158
+ mergeability: not_mergeable
159
+ domain_flavors: ["legal_case"]
160
+ passive: true
161
+
162
+ - name: "class_action_code"
163
+ namespace: "fjcidb"
164
+ type: string
165
+ description: "Class action designation code"
166
+ display_name: "Class action code"
167
+ mergeability: not_mergeable
168
+ domain_flavors: ["legal_case"]
169
+ passive: true
170
+
171
+ - name: "procedural_progress_code"
172
+ namespace: "fjcidb"
173
+ type: string
174
+ description: "Procedural progress code"
175
+ display_name: "Procedural progress code"
176
+ mergeability: not_mergeable
177
+ domain_flavors: ["legal_case"]
178
+ passive: true
179
+
180
+ relationships:
181
+ - name: "is_plaintiff_in"
182
+ namespace: "fjcidb"
183
+ description: "The party is named as the plaintiff side in this FJC civil case"
184
+ display_name: "Is plaintiff in"
185
+ mergeability: not_mergeable
186
+ domain_flavors: ["organization", "person"]
187
+ target_flavors: ["legal_case"]
188
+ passive: true
189
+
190
+ - name: "is_defendant_in"
191
+ namespace: "fjcidb"
192
+ description: "The party is named as the defendant side in this FJC civil case"
193
+ display_name: "Is defendant in"
194
+ mergeability: not_mergeable
195
+ domain_flavors: ["organization", "person"]
196
+ target_flavors: ["legal_case"]
197
+ passive: true
198
+
199
+ - name: "has_nature_of_suit"
200
+ namespace: "fjcidb"
201
+ description: "The civil case is classified under this federal nature-of-suit code (IDB NOS field)"
202
+ display_name: "Has nature of suit"
203
+ mergeability: not_mergeable
204
+ domain_flavors: ["legal_case"]
205
+ target_flavors: ["nature_of_suit"]
206
+ passive: true