eyecite-ts 0.10.0 → 0.10.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +154 -432
- package/dist/index.cjs +1 -1
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts.map +1 -1
- package/dist/index.d.mts.map +1 -1
- package/dist/index.mjs +1 -1
- package/dist/index.mjs.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -9,22 +9,9 @@
|
|
|
9
9
|
[](https://www.typescriptlang.org/)
|
|
10
10
|
[](https://www.npmjs.com/package/eyecite-ts)
|
|
11
11
|
|
|
12
|
-
TypeScript legal citation extraction
|
|
12
|
+
TypeScript legal citation extraction — a port of Python [eyecite](https://github.com/freelawproject/eyecite) with extended capabilities.
|
|
13
13
|
|
|
14
|
-
Extract
|
|
15
|
-
|
|
16
|
-
## Features
|
|
17
|
-
|
|
18
|
-
- **Full citation extraction**: Case citations, statutes (20 jurisdictions), constitutional citations (U.S. + 50 states), journal articles, neutral citations, public laws, federal register
|
|
19
|
-
- **Case name & full span**: Backward search extracts case names ("Smith v. Jones", "In re Smith"), `fullSpan` covers case name through closing parenthetical
|
|
20
|
-
- **Parallel citation linking**: Automatic detection and grouping of comma-separated citations sharing a parenthetical (e.g., "410 U.S. 113, 93 S. Ct. 705 (1973)")
|
|
21
|
-
- **Complex parentheticals**: Unified parser handles court+year, full dates (Jan. 15, 2020 / January 15, 2020 / 1/15/2020), disposition (en banc, per curiam), and chained parentheticals
|
|
22
|
-
- **Short-form resolution**: Id./Ibid., supra, and short-form case citations resolved to their full antecedents
|
|
23
|
-
- **Reporter database**: 1,200+ reporters with variant matching and confidence scoring
|
|
24
|
-
- **Citation annotation**: HTML markup with auto-escape XSS protection and position tracking
|
|
25
|
-
- **Bundle optimization**: Tree-shakeable exports, lazy-loaded reporter data, separate entry points
|
|
26
|
-
- **TypeScript native**: Discriminated unions, conditional types, type guards, full IntelliSense
|
|
27
|
-
- **Zero dependencies**: No runtime dependencies, ~10KB gzipped core bundle
|
|
14
|
+
Extract structured data from legal citations in court opinions, briefs, and legal documents. A citation like `500 F.2d 123 (9th Cir. 2020)` encodes a volume (500), reporter (Federal Reporter, 2nd Series), page (123), court (Ninth Circuit), and year. This library parses all of that into typed objects, resolves short-form references like "Id." back to their antecedents, and can annotate the original text with HTML markup. Zero runtime dependencies, browser-compatible, ~20 KB brotli.
|
|
28
15
|
|
|
29
16
|
## Installation
|
|
30
17
|
|
|
@@ -34,511 +21,246 @@ npm install eyecite-ts
|
|
|
34
21
|
|
|
35
22
|
## Quick Start
|
|
36
23
|
|
|
37
|
-
|
|
38
|
-
import { extractCitations } from 'eyecite-ts'
|
|
39
|
-
|
|
40
|
-
const text = 'See Smith v. Jones, 500 F.2d 123 (9th Cir. Jan. 15, 2020)'
|
|
41
|
-
const citations = extractCitations(text)
|
|
42
|
-
|
|
43
|
-
console.log(citations[0])
|
|
44
|
-
// {
|
|
45
|
-
// type: 'case',
|
|
46
|
-
// volume: 500,
|
|
47
|
-
// reporter: 'F.2d',
|
|
48
|
-
// page: 123,
|
|
49
|
-
// court: '9th Cir.',
|
|
50
|
-
// year: 2020,
|
|
51
|
-
// caseName: 'Smith v. Jones',
|
|
52
|
-
// date: { iso: '2020-01-15', parsed: { year: 2020, month: 1, day: 15 } },
|
|
53
|
-
// confidence: 0.85,
|
|
54
|
-
// span: { originalStart: 20, originalEnd: 33, ... },
|
|
55
|
-
// fullSpan: { originalStart: 4, originalEnd: 57, ... }
|
|
56
|
-
// }
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
## Citation Extraction
|
|
60
|
-
|
|
61
|
-
### Multiple Citation Types
|
|
62
|
-
|
|
63
|
-
```typescript
|
|
64
|
-
import { extractCitations } from 'eyecite-ts'
|
|
65
|
-
|
|
66
|
-
const text = `
|
|
67
|
-
See Smith v. Jones, 500 F.2d 123 (9th Cir. 2020).
|
|
68
|
-
Also 42 U.S.C. § 1983.
|
|
69
|
-
Compare 123 Harv. L. Rev. 456.
|
|
70
|
-
`
|
|
71
|
-
const citations = extractCitations(text)
|
|
72
|
-
|
|
73
|
-
citations.forEach(citation => {
|
|
74
|
-
console.log(citation.type) // 'case', 'statute', 'journal', etc.
|
|
75
|
-
})
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
### Statute Citations
|
|
79
|
-
|
|
80
|
-
Extract citations from 20 state and federal jurisdictions with subsection, et seq., and jurisdiction identification:
|
|
24
|
+
A complete extract → resolve → annotate workflow:
|
|
81
25
|
|
|
82
26
|
```typescript
|
|
83
|
-
import { extractCitations } from
|
|
84
|
-
|
|
85
|
-
const text = `
|
|
86
|
-
See 42 U.S.C. § 1983(a)(1) et seq.
|
|
87
|
-
Also Cal. Penal Code § 187.
|
|
88
|
-
And N.Y. Penal Law § 125.25(1)(a).
|
|
89
|
-
Compare 735 ILCS 5/2-1001.
|
|
90
|
-
`
|
|
91
|
-
const citations = extractCitations(text)
|
|
92
|
-
|
|
93
|
-
// Federal with subsections + et seq.
|
|
94
|
-
// { type: 'statute', title: 42, code: 'U.S.C.', section: '1983',
|
|
95
|
-
// subsection: '(a)(1)', jurisdiction: 'US', hasEtSeq: true, confidence: 1.0 }
|
|
96
|
-
|
|
97
|
-
// California named-code
|
|
98
|
-
// { type: 'statute', code: 'Penal', section: '187', jurisdiction: 'CA', confidence: 0.95 }
|
|
99
|
-
|
|
100
|
-
// New York named-code with subsections
|
|
101
|
-
// { type: 'statute', code: 'Penal Law', section: '125.25',
|
|
102
|
-
// subsection: '(1)(a)', jurisdiction: 'NY', confidence: 1.0 }
|
|
103
|
-
|
|
104
|
-
// Illinois chapter-act format
|
|
105
|
-
// { type: 'statute', title: 735, code: '5', section: '2-1001',
|
|
106
|
-
// jurisdiction: 'IL', confidence: 0.95 }
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
**Supported jurisdictions:**
|
|
110
|
-
|
|
111
|
-
| Family | Jurisdictions |
|
|
112
|
-
|--------|--------------|
|
|
113
|
-
| Federal | USC, CFR, prose ("section X of title Y") |
|
|
114
|
-
| Named-code | NY (21 laws), CA (29 codes), TX (29 codes), MD (36 articles), VA, AL, MA |
|
|
115
|
-
| Abbreviated-code | FL, OH, MI, UT, CO, WA, NC, GA, PA, IN, NJ, DE |
|
|
116
|
-
| Chapter-act | IL (ILCS) |
|
|
117
|
-
|
|
118
|
-
### Constitutional Citations
|
|
119
|
-
|
|
120
|
-
Extract U.S. and state constitutional citations with article, amendment, section, and clause parsing:
|
|
121
|
-
|
|
122
|
-
```typescript
|
|
123
|
-
import { extractCitations } from 'eyecite-ts'
|
|
124
|
-
|
|
125
|
-
const text = `
|
|
126
|
-
Under U.S. Const. amend. XIV, § 1, equal protection is guaranteed.
|
|
127
|
-
See also Cal. Const. art. I, § 7.
|
|
128
|
-
And U.S. Const. art. I, § 8, cl. 3.
|
|
129
|
-
`
|
|
130
|
-
const citations = extractCitations(text)
|
|
131
|
-
|
|
132
|
-
// U.S. amendment with section
|
|
133
|
-
// { type: 'constitutional', jurisdiction: 'US', amendment: 14,
|
|
134
|
-
// section: '1', confidence: 0.95 }
|
|
135
|
-
|
|
136
|
-
// California article with section
|
|
137
|
-
// { type: 'constitutional', jurisdiction: 'CA', article: 1,
|
|
138
|
-
// section: '7', confidence: 0.9 }
|
|
139
|
-
|
|
140
|
-
// Commerce Clause (article + section + clause)
|
|
141
|
-
// { type: 'constitutional', jurisdiction: 'US', article: 1,
|
|
142
|
-
// section: '8', clause: 3, confidence: 0.95 }
|
|
143
|
-
```
|
|
27
|
+
import { extractCitations } from "eyecite-ts"
|
|
28
|
+
import { annotate } from "eyecite-ts/annotate"
|
|
144
29
|
|
|
145
|
-
|
|
30
|
+
const text = `In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020), the court
|
|
31
|
+
applied 42 U.S.C. § 1983. Id. at 130. See also 123 Harv. L. Rev. 456 (2019).`
|
|
146
32
|
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
```typescript
|
|
150
|
-
import { extractCitationsAsync } from 'eyecite-ts'
|
|
151
|
-
|
|
152
|
-
const citations = await extractCitationsAsync(text)
|
|
153
|
-
```
|
|
154
|
-
|
|
155
|
-
### Custom Patterns
|
|
156
|
-
|
|
157
|
-
```typescript
|
|
158
|
-
import { extractCitations } from 'eyecite-ts'
|
|
159
|
-
import { casePatterns } from 'eyecite-ts'
|
|
160
|
-
|
|
161
|
-
// Extract only case citations
|
|
162
|
-
const citations = extractCitations(text, {
|
|
163
|
-
patterns: casePatterns
|
|
164
|
-
})
|
|
165
|
-
```
|
|
166
|
-
|
|
167
|
-
### Custom Cleaners
|
|
168
|
-
|
|
169
|
-
```typescript
|
|
170
|
-
import { extractCitations, cleanText } from 'eyecite-ts'
|
|
171
|
-
|
|
172
|
-
// Use only HTML stripping
|
|
173
|
-
const citations = extractCitations(html, {
|
|
174
|
-
cleaners: [(text) => text.replace(/<[^>]+>/g, '')]
|
|
175
|
-
})
|
|
176
|
-
```
|
|
177
|
-
|
|
178
|
-
## Case Names & Full Spans
|
|
179
|
-
|
|
180
|
-
Case citations can include the case name and full citation boundaries:
|
|
181
|
-
|
|
182
|
-
```typescript
|
|
183
|
-
const text = 'In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc), the court held...'
|
|
184
|
-
const citations = extractCitations(text)
|
|
33
|
+
// Step 1: Extract and resolve in one call
|
|
34
|
+
const citations = extractCitations(text, { resolve: true })
|
|
185
35
|
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
36
|
+
// Step 2: Inspect results
|
|
37
|
+
for (const cite of citations) {
|
|
38
|
+
switch (cite.type) {
|
|
39
|
+
case "case":
|
|
40
|
+
console.log(cite.caseName, cite.reporter, cite.year)
|
|
41
|
+
// "Smith v. Jones" "F.2d" 2020
|
|
42
|
+
break
|
|
43
|
+
case "statute":
|
|
44
|
+
console.log(cite.title, cite.code, cite.section)
|
|
45
|
+
// 42 "U.S.C." "1983"
|
|
46
|
+
break
|
|
47
|
+
case "id":
|
|
48
|
+
console.log("Id. resolves to index", cite.resolution?.resolvedTo)
|
|
49
|
+
// Id. resolves to index 0
|
|
50
|
+
break
|
|
51
|
+
case "journal":
|
|
52
|
+
console.log(cite.journal, cite.volume, cite.page)
|
|
53
|
+
// "Harv. L. Rev." 123 456
|
|
54
|
+
break
|
|
55
|
+
}
|
|
191
56
|
}
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
Procedural prefixes are recognized automatically:
|
|
195
|
-
|
|
196
|
-
```typescript
|
|
197
|
-
const text = 'In re Smith, 410 U.S. 113 (1973)'
|
|
198
|
-
// caseName: 'In re Smith'
|
|
199
57
|
|
|
200
|
-
|
|
201
|
-
|
|
58
|
+
// Step 3: Annotate the original text
|
|
59
|
+
const result = annotate(text, citations, {
|
|
60
|
+
template: { before: '<cite>', after: '</cite>' },
|
|
61
|
+
})
|
|
62
|
+
console.log(result.text)
|
|
202
63
|
```
|
|
203
64
|
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
Parentheticals with full dates return structured date objects:
|
|
65
|
+
## What It Extracts
|
|
207
66
|
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
67
|
+
| Type | Example | Key Fields |
|
|
68
|
+
|------|---------|------------|
|
|
69
|
+
| `case` | `500 F.2d 123 (9th Cir. 2020)` | volume, reporter, page, court, year, caseName |
|
|
70
|
+
| `statute` | `42 U.S.C. § 1983(a)(1)` | title, code, section, subsection, jurisdiction |
|
|
71
|
+
| `constitutional` | `U.S. Const. amend. XIV, § 1` | jurisdiction, amendment, section, clause |
|
|
72
|
+
| `journal` | `123 Harv. L. Rev. 456` | volume, journal, page, year |
|
|
73
|
+
| `neutral` | `2020 WL 123456` | year, court, documentNumber |
|
|
74
|
+
| `publicLaw` | `Pub. L. No. 117-263` | congress, lawNumber |
|
|
75
|
+
| `federalRegister` | `87 Fed. Reg. 1234` | volume, page, year |
|
|
76
|
+
| `statutesAtLarge` | `136 Stat. 4459` | volume, page, year |
|
|
77
|
+
| `id` | `Id. at 125` | pincite |
|
|
78
|
+
| `supra` | `Smith, supra, at 130` | partyName, pincite |
|
|
79
|
+
| `shortFormCase` | `500 F.2d at 140` | volume, reporter, pincite |
|
|
211
80
|
|
|
212
|
-
|
|
213
|
-
// date: { iso: '1973', parsed: { year: 1973 } }
|
|
214
|
-
```
|
|
81
|
+
Statute coverage spans 52 jurisdictions (50 states + DC + federal). See the [Advanced Extraction Guide](docs/guides/advanced-extraction.md) for jurisdiction details.
|
|
215
82
|
|
|
216
|
-
|
|
83
|
+
## Key Features
|
|
217
84
|
|
|
218
|
-
###
|
|
85
|
+
### Case Names & Full Spans
|
|
219
86
|
|
|
220
|
-
|
|
87
|
+
The library backward-searches for party names and tracks full citation boundaries:
|
|
221
88
|
|
|
222
89
|
```typescript
|
|
223
|
-
const text =
|
|
224
|
-
const
|
|
225
|
-
|
|
226
|
-
if (
|
|
227
|
-
|
|
228
|
-
|
|
90
|
+
const text = "In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc), the court held..."
|
|
91
|
+
const [cite] = extractCitations(text)
|
|
92
|
+
|
|
93
|
+
if (cite.type === "case") {
|
|
94
|
+
cite.caseName // "Smith v. Jones"
|
|
95
|
+
cite.plaintiff // "Smith"
|
|
96
|
+
cite.defendant // "Jones"
|
|
97
|
+
cite.disposition // "en banc"
|
|
98
|
+
cite.span // covers "500 F.2d 123" (citation core)
|
|
99
|
+
cite.fullSpan // covers "Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc)"
|
|
229
100
|
}
|
|
230
101
|
```
|
|
231
102
|
|
|
232
|
-
|
|
103
|
+
Procedural prefixes like `In re`, `Ex parte`, and `Matter of` are recognized automatically.
|
|
233
104
|
|
|
234
|
-
|
|
105
|
+
### Parallel Citations
|
|
235
106
|
|
|
236
|
-
When multiple
|
|
107
|
+
When multiple reporters cite the same case (common in older Supreme Court opinions), the library groups them automatically:
|
|
237
108
|
|
|
238
109
|
```typescript
|
|
239
|
-
const text =
|
|
110
|
+
const text = "See 410 U.S. 113, 93 S. Ct. 705, 35 L. Ed. 2d 147 (1973)."
|
|
240
111
|
const citations = extractCitations(text)
|
|
241
112
|
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
console.log(citations[2].groupId) // "410-U.S.-113" (same group)
|
|
246
|
-
|
|
247
|
-
// Primary citation (first in group) has parallelCitations array
|
|
248
|
-
if (citations[0].type === 'case') {
|
|
249
|
-
console.log(citations[0].parallelCitations)
|
|
250
|
-
// [
|
|
251
|
-
// { volume: 93, reporter: 'S. Ct.', page: 705 },
|
|
252
|
-
// { volume: 35, reporter: 'L. Ed. 2d', page: 147 }
|
|
253
|
-
// ]
|
|
254
|
-
}
|
|
113
|
+
citations[0].groupId // "410-U.S.-113"
|
|
114
|
+
citations[1].groupId // "410-U.S.-113" (same group)
|
|
115
|
+
citations[2].groupId // "410-U.S.-113" (same group)
|
|
255
116
|
|
|
256
|
-
//
|
|
257
|
-
|
|
258
|
-
|
|
117
|
+
// Primary citation carries the linked array
|
|
118
|
+
if (citations[0].type === "case") {
|
|
119
|
+
citations[0].parallelCitations
|
|
120
|
+
// [{ volume: 93, reporter: 'S. Ct.', page: 705 },
|
|
121
|
+
// { volume: 35, reporter: 'L. Ed. 2d', page: 147 }]
|
|
122
|
+
}
|
|
259
123
|
```
|
|
260
124
|
|
|
261
|
-
|
|
262
|
-
- All citations in a parallel group share the same `groupId`
|
|
263
|
-
- Only the **first citation** (primary) has the `parallelCitations` array
|
|
264
|
-
- Secondary citations remain in the results array for individual processing
|
|
265
|
-
- Group ID format: `${volume}-${reporter}-${page}` (e.g., "410-U.S.-113")
|
|
266
|
-
|
|
267
|
-
Use `groupId` to identify which citations refer to the same case, or access `parallelCitations` on the primary to get all reporters at once.
|
|
268
|
-
|
|
269
|
-
## Resolving Short-Form Citations
|
|
270
|
-
|
|
271
|
-
Short-form citations (Id., supra, short-form case) refer to earlier citations in the document. The resolution engine links them to their full antecedents.
|
|
125
|
+
### Short-Form Resolution
|
|
272
126
|
|
|
273
|
-
|
|
127
|
+
Pass `{ resolve: true }` to link Id., supra, and short-form case citations to their full antecedents:
|
|
274
128
|
|
|
275
129
|
```typescript
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
const text = `
|
|
279
|
-
Smith v. Jones, 500 F.2d 123 (2020).
|
|
280
|
-
Id. at 125.
|
|
281
|
-
Smith, supra, at 130.
|
|
282
|
-
500 F.2d at 140.
|
|
283
|
-
`
|
|
284
|
-
|
|
130
|
+
const text = `Smith v. Jones, 500 F.2d 123 (2020). Id. at 125.`
|
|
285
131
|
const citations = extractCitations(text, { resolve: true })
|
|
286
132
|
|
|
287
|
-
|
|
288
|
-
console.log(citations[1].resolution)
|
|
133
|
+
citations[1].resolution
|
|
289
134
|
// { resolvedTo: 0, confidence: 1.0 }
|
|
290
135
|
```
|
|
291
136
|
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
```typescript
|
|
295
|
-
import { extractCitations, resolveCitations } from 'eyecite-ts'
|
|
296
|
-
|
|
297
|
-
const citations = extractCitations(text)
|
|
298
|
-
|
|
299
|
-
const resolved = resolveCitations(citations, text, {
|
|
300
|
-
scopeStrategy: 'paragraph',
|
|
301
|
-
fuzzyPartyMatching: true,
|
|
302
|
-
partyMatchThreshold: 0.8,
|
|
303
|
-
reportUnresolved: true
|
|
304
|
-
})
|
|
305
|
-
```
|
|
306
|
-
|
|
307
|
-
### Resolution Options
|
|
308
|
-
|
|
309
|
-
| Option | Type | Default | Description |
|
|
310
|
-
|--------|------|---------|-------------|
|
|
311
|
-
| `scopeStrategy` | `'paragraph'` \| `'section'` \| `'footnote'` \| `'none'` | `'paragraph'` | How far back to search for antecedents |
|
|
312
|
-
| `autoDetectParagraphs` | `boolean` | `true` | Auto-detect paragraph boundaries from text |
|
|
313
|
-
| `paragraphBoundaryPattern` | `RegExp` | `/\n\n+/` | Pattern to detect paragraphs |
|
|
314
|
-
| `fuzzyPartyMatching` | `boolean` | `true` | Enable fuzzy party name matching for supra |
|
|
315
|
-
| `partyMatchThreshold` | `number` | `0.8` | Similarity threshold (0-1) for fuzzy matching |
|
|
316
|
-
| `reportUnresolved` | `boolean` | `true` | Report failure reasons for unresolved citations |
|
|
317
|
-
|
|
318
|
-
### Resolution Examples
|
|
319
|
-
|
|
320
|
-
**Id. citations:**
|
|
321
|
-
|
|
322
|
-
```typescript
|
|
323
|
-
const text = 'Smith v. Jones, 500 F.2d 123. Id. at 125.'
|
|
324
|
-
const citations = extractCitations(text, { resolve: true })
|
|
325
|
-
// citations[1].resolution.resolvedTo === 0
|
|
326
|
-
```
|
|
327
|
-
|
|
328
|
-
**Supra citations:**
|
|
329
|
-
|
|
330
|
-
```typescript
|
|
331
|
-
const text = 'Smith v. Jones, 500 F.2d 123. Smith, supra, at 130.'
|
|
332
|
-
const citations = extractCitations(text, { resolve: true })
|
|
333
|
-
// citations[1].resolution.resolvedTo === 0 (party name matches "Smith")
|
|
334
|
-
```
|
|
335
|
-
|
|
336
|
-
**Short-form case citations:**
|
|
137
|
+
The resolver supports paragraph/section/footnote scope boundaries, fuzzy party name matching, and configurable thresholds. See the [Resolution Guide](docs/guides/resolution.md) for the power-user API.
|
|
337
138
|
|
|
338
|
-
|
|
339
|
-
const text = 'Brown v. Board, 347 U.S. 483. See 347 U.S. at 495.'
|
|
340
|
-
const citations = extractCitations(text, { resolve: true })
|
|
341
|
-
// citations[1].resolution.resolvedTo === 0 (volume/reporter matches)
|
|
342
|
-
```
|
|
139
|
+
### Citation Annotation
|
|
343
140
|
|
|
344
|
-
|
|
141
|
+
Mark up citations with HTML using template or callback modes:
|
|
345
142
|
|
|
346
143
|
```typescript
|
|
347
|
-
|
|
348
|
-
const citations = extractCitations(text, { resolve: true })
|
|
349
|
-
// citations[0].resolution.failureReason === 'No preceding citation found'
|
|
350
|
-
```
|
|
351
|
-
|
|
352
|
-
## Citation Annotation
|
|
144
|
+
import { annotate } from "eyecite-ts/annotate"
|
|
353
145
|
|
|
354
|
-
Add HTML markup to citations in text:
|
|
355
|
-
|
|
356
|
-
```typescript
|
|
357
|
-
import { annotate } from 'eyecite-ts/annotate'
|
|
358
|
-
import { extractCitations } from 'eyecite-ts'
|
|
359
|
-
|
|
360
|
-
const text = 'See Smith v. Jones, 500 F.2d 123 (2020).'
|
|
361
|
-
const citations = extractCitations(text)
|
|
362
|
-
|
|
363
|
-
// Template mode
|
|
364
|
-
const result = annotate(text, citations, {
|
|
365
|
-
template: { before: '<cite>', after: '</cite>' }
|
|
366
|
-
})
|
|
367
|
-
// result.text === 'See Smith v. Jones, <cite>500 F.2d 123</cite> (2020).'
|
|
368
|
-
|
|
369
|
-
// Callback mode (full control)
|
|
370
|
-
const result2 = annotate(text, citations, {
|
|
371
|
-
callback: (citation, surrounding) => {
|
|
372
|
-
if (citation.type === 'case') {
|
|
373
|
-
return `<a href="/cases/${citation.volume}">${citation.matchedText}</a>`
|
|
374
|
-
}
|
|
375
|
-
return citation.matchedText
|
|
376
|
-
}
|
|
377
|
-
})
|
|
378
|
-
```
|
|
379
|
-
|
|
380
|
-
Auto-escape is enabled by default for XSS protection:
|
|
381
|
-
|
|
382
|
-
```typescript
|
|
383
146
|
const result = annotate(text, citations, {
|
|
384
147
|
template: { before: '<cite>', after: '</cite>' },
|
|
385
|
-
autoEscape: true // default — escapes &, <, >, ", ', /
|
|
386
148
|
})
|
|
149
|
+
// "See Smith v. Jones, <cite>500 F.2d 123</cite> (2020)."
|
|
387
150
|
```
|
|
388
151
|
|
|
389
|
-
|
|
152
|
+
XSS auto-escape is enabled by default. Use `useFullSpan: true` to annotate from case name through closing parenthetical. See the [Annotation Guide](docs/guides/annotation.md) for callback mode and full options.
|
|
390
153
|
|
|
391
|
-
|
|
154
|
+
### Confidence & Signals
|
|
392
155
|
|
|
393
|
-
|
|
394
|
-
const text = 'In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc), the court held...'
|
|
395
|
-
const citations = extractCitations(text)
|
|
156
|
+
Each citation carries a `confidence` score (0-1) based on pattern match quality and reporter validation. Citations preceded by legal signals are tagged:
|
|
396
157
|
|
|
397
|
-
|
|
398
|
-
const
|
|
399
|
-
|
|
400
|
-
})
|
|
401
|
-
// Result: "In Smith v. Jones, <cite>500 F.2d 123</cite> (9th Cir. 2020) (en banc), the court held..."
|
|
158
|
+
```typescript
|
|
159
|
+
const text = "See also Smith v. Jones, 500 F.2d 123 (2020)."
|
|
160
|
+
const [cite] = extractCitations(text)
|
|
402
161
|
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
template: { before: '<cite>', after: '</cite>' },
|
|
406
|
-
useFullSpan: true
|
|
407
|
-
})
|
|
408
|
-
// Result: "In <cite>Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc)</cite>, the court held..."
|
|
162
|
+
cite.confidence // 0.85
|
|
163
|
+
cite.signal // "see also"
|
|
409
164
|
```
|
|
410
165
|
|
|
411
|
-
|
|
412
|
-
- Case name (if present)
|
|
413
|
-
- Volume-reporter-page
|
|
414
|
-
- Court and date parenthetical
|
|
415
|
-
- Disposition parenthetical (en banc, per curiam)
|
|
416
|
-
- Chained parentheticals
|
|
417
|
-
- Subsequent history
|
|
418
|
-
|
|
419
|
-
Use `useFullSpan: true` when you want to highlight the entire citation as a unit, or `useFullSpan: false` (default) to annotate only the citation core for minimal markup.
|
|
166
|
+
### Footnote Detection
|
|
420
167
|
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
Validate case citations against the reporters database:
|
|
168
|
+
Opt-in feature that tags citations with their footnote context and enables zone-scoped resolution:
|
|
424
169
|
|
|
425
170
|
```typescript
|
|
426
|
-
|
|
171
|
+
const citations = extractCitations(text, { detectFootnotes: true })
|
|
427
172
|
|
|
428
|
-
const
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
432
|
-
|
|
173
|
+
for (const cite of citations) {
|
|
174
|
+
if (cite.inFootnote) {
|
|
175
|
+
console.log(`Footnote ${cite.footnoteNumber}: ${cite.matchedText}`)
|
|
176
|
+
}
|
|
177
|
+
}
|
|
433
178
|
```
|
|
434
179
|
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
All citation types use a discriminated union on the `type` field:
|
|
180
|
+
Supports HTML footnote tags and plaintext footnote sections (separator + numbered markers). See the [Footnote Detection Guide](docs/guides/footnote-detection.md).
|
|
438
181
|
|
|
439
|
-
|
|
440
|
-
import type {
|
|
441
|
-
Citation, // Union of all 11 types
|
|
442
|
-
FullCaseCitation,
|
|
443
|
-
StatuteCitation,
|
|
444
|
-
ConstitutionalCitation,
|
|
445
|
-
JournalCitation,
|
|
446
|
-
NeutralCitation,
|
|
447
|
-
PublicLawCitation,
|
|
448
|
-
FederalRegisterCitation,
|
|
449
|
-
IdCitation,
|
|
450
|
-
SupraCitation,
|
|
451
|
-
ShortFormCaseCitation,
|
|
452
|
-
CitationOfType, // Extract subtype: CitationOfType<'case'> = FullCaseCitation
|
|
453
|
-
ExtractorMap, // Maps FullCitationType keys to citation subtypes
|
|
454
|
-
FullCitation, // Union of full citation types
|
|
455
|
-
ShortFormCitation, // Union of short-form types
|
|
456
|
-
} from 'eyecite-ts'
|
|
457
|
-
```
|
|
182
|
+
## Type System
|
|
458
183
|
|
|
459
|
-
|
|
184
|
+
All citation types use a [discriminated union](https://www.typescriptlang.org/docs/handbook/2/narrowing.html#discriminated-unions) on the `type` field:
|
|
460
185
|
|
|
461
186
|
```typescript
|
|
462
|
-
import {
|
|
463
|
-
|
|
464
|
-
isShortFormCitation,
|
|
465
|
-
isCaseCitation,
|
|
466
|
-
isCitationType,
|
|
467
|
-
assertUnreachable
|
|
468
|
-
} from 'eyecite-ts'
|
|
469
|
-
|
|
470
|
-
// Specific guards
|
|
471
|
-
if (isFullCitation(citation)) {
|
|
472
|
-
// citation: FullCitation
|
|
473
|
-
}
|
|
187
|
+
import type { Citation, FullCaseCitation, StatuteCitation } from "eyecite-ts"
|
|
188
|
+
import { isFullCitation, isCaseCitation, assertUnreachable } from "eyecite-ts"
|
|
474
189
|
|
|
475
|
-
//
|
|
476
|
-
if (
|
|
477
|
-
//
|
|
190
|
+
// Type guards
|
|
191
|
+
if (isCaseCitation(citation)) {
|
|
192
|
+
citation.reporter // typed as string
|
|
478
193
|
}
|
|
479
194
|
|
|
480
|
-
//
|
|
195
|
+
// Exhaustive switch
|
|
481
196
|
switch (citation.type) {
|
|
482
|
-
case
|
|
483
|
-
case
|
|
484
|
-
// ... all 11 types
|
|
197
|
+
case "case": /* ... */ break
|
|
198
|
+
case "statute": /* ... */ break
|
|
199
|
+
// ... all 11 types
|
|
485
200
|
default: assertUnreachable(citation.type)
|
|
486
201
|
}
|
|
487
202
|
```
|
|
488
203
|
|
|
489
|
-
|
|
204
|
+
`CitationOfType<'case'>` extracts the subtype: `CitationOfType<'case'>` = `FullCaseCitation`. See the [Type Reference](docs/api/types.md) for the full catalog.
|
|
490
205
|
|
|
491
|
-
|
|
206
|
+
## Bundle Size
|
|
492
207
|
|
|
493
|
-
|
|
494
|
-
import type { ResolvedCitation } from 'eyecite-ts'
|
|
208
|
+
Three entry points for tree-shaking:
|
|
495
209
|
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
210
|
+
| Entry Point | Import | Size (brotli) |
|
|
211
|
+
|-------------|--------|---------------|
|
|
212
|
+
| Core extraction | `eyecite-ts` | ~20 KB |
|
|
213
|
+
| Annotation | `eyecite-ts/annotate` | ~1.3 KB |
|
|
214
|
+
| Reporter data | `eyecite-ts/data` | lazy-loaded |
|
|
499
215
|
|
|
500
|
-
|
|
216
|
+
Import only what you need — the reporter database is loaded on first use, not at import time.
|
|
501
217
|
|
|
502
|
-
|
|
218
|
+
## Comparison with Python eyecite
|
|
503
219
|
|
|
504
|
-
|
|
505
|
-
|------------|--------|---------|
|
|
506
|
-
| Core extraction | `eyecite-ts` | ~10 KB |
|
|
507
|
-
| Annotation | `eyecite-ts/annotate` | 0.7 KB |
|
|
508
|
-
| Reporter data | `eyecite-ts/data` | 86.5 KB (lazy-loaded) |
|
|
220
|
+
Every claim verified against [Python eyecite](https://github.com/freelawproject/eyecite) source code (April 2026).
|
|
509
221
|
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
222
|
+
| Capability | Python eyecite | eyecite-ts | Notes |
|
|
223
|
+
|---|---|---|---|
|
|
224
|
+
| Case citations | Yes | Yes | Both extract volume/reporter/page/court/year |
|
|
225
|
+
| Statute citations | Yes (all 50 states + DC + territories) | Yes (50 states + DC + federal) | Python uses `reporters-db`; TS uses built-in patterns |
|
|
226
|
+
| Constitutional citations | No | Yes (U.S. + 50 states) | Dedicated type with article/amendment/section/clause |
|
|
227
|
+
| Journal / law review | Yes | Yes | |
|
|
228
|
+
| Neutral (WL/LEXIS) | Yes (as case citations) | Yes (dedicated type) | |
|
|
229
|
+
| Short-form resolution | Yes | Yes | |
|
|
230
|
+
| Case name extraction | Yes | Yes | Both use backward scanning |
|
|
231
|
+
| Parallel citation linking | Partial (detection + metadata copy) | Yes (`groupId` + `parallelCitations`) | |
|
|
232
|
+
| Full span tracking | Yes | Yes | TS carries dual clean/original positions |
|
|
233
|
+
| Component spans | Minimal (pin cite only) | Yes (all fields) | |
|
|
234
|
+
| Footnote detection | No | Yes | HTML + plaintext strategies |
|
|
235
|
+
| Citation signals | No (stop words only) | Yes (extracted as metadata) | |
|
|
236
|
+
| Annotation | Yes (HTML modes) | Yes (template/callback + XSS auto-escape) | |
|
|
237
|
+
| Position mapping | Yes (diff-based) | Yes (incremental TransformationMap) | |
|
|
238
|
+
| Type system | Class inheritance | Discriminated union | TS enables exhaustive switch |
|
|
515
239
|
|
|
516
|
-
|
|
240
|
+
eyecite-ts started as a port and has diverged. Both are capable citation extractors — eyecite-ts adds constitutional citations, footnote detection, citation signals, rich component spans, and a TypeScript-native type system, while Python eyecite has broader statute coverage via `reporters-db` and a mature ecosystem.
|
|
517
241
|
|
|
518
|
-
|
|
242
|
+
Coming from Python eyecite? See the [Migration Guide](docs/guides/migration-from-python.md).
|
|
519
243
|
|
|
520
|
-
|
|
521
|
-
2. **Tokenize**: Apply regex patterns to find citation candidates
|
|
522
|
-
3. **Extract**: Parse metadata (volume, reporter, page, etc.)
|
|
523
|
-
4. **Resolve** (optional): Link short-form citations to antecedents
|
|
244
|
+
## Architecture
|
|
524
245
|
|
|
525
|
-
|
|
246
|
+
Citations flow through a 4-stage pipeline: **clean → tokenize → extract → resolve**. Text cleaning builds a `TransformationMap` that tracks position shifts, so every citation carries dual coordinates (cleaned and original text). Resolution is optional and runs as a final pass.
|
|
526
247
|
|
|
527
248
|
See [ARCHITECTURE.md](ARCHITECTURE.md) for details.
|
|
528
249
|
|
|
529
250
|
## Development
|
|
530
251
|
|
|
531
252
|
```bash
|
|
532
|
-
pnpm install
|
|
533
|
-
pnpm test
|
|
534
|
-
pnpm exec vitest run
|
|
535
|
-
pnpm typecheck
|
|
536
|
-
pnpm build
|
|
537
|
-
pnpm lint
|
|
538
|
-
pnpm format
|
|
253
|
+
pnpm install # Install dependencies (corepack, pnpm 10)
|
|
254
|
+
pnpm test # Run tests (vitest, watch mode)
|
|
255
|
+
pnpm exec vitest run # Run tests once (1,748 tests, 72 files)
|
|
256
|
+
pnpm typecheck # Type-check with tsc
|
|
257
|
+
pnpm build # Build (ESM + CJS + DTS)
|
|
258
|
+
pnpm lint # Lint with Biome
|
|
259
|
+
pnpm format # Format with Biome
|
|
260
|
+
pnpm size # Check bundle size limits
|
|
539
261
|
```
|
|
540
262
|
|
|
541
|
-
|
|
263
|
+
Requires Node.js >= 18.0.0. See [ARCHITECTURE.md](ARCHITECTURE.md) for contributor orientation.
|
|
542
264
|
|
|
543
265
|
## License
|
|
544
266
|
|
|
@@ -546,4 +268,4 @@ MIT
|
|
|
546
268
|
|
|
547
269
|
## Credits
|
|
548
270
|
|
|
549
|
-
Inspired by [eyecite](https://github.com/freelawproject/eyecite) (Python) by Free Law Project. This TypeScript implementation
|
|
271
|
+
Inspired by and ported from [eyecite](https://github.com/freelawproject/eyecite) (Python) by [Free Law Project](https://free.law/). This TypeScript implementation extends the original with constitutional citations, footnote detection, citation signals, parallel citation grouping, component spans, and a discriminated-union type system.
|