eyecite-ts 0.10.0 → 0.10.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,22 +9,9 @@
9
9
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-blue.svg)](https://www.typescriptlang.org/)
10
10
  [![Zero Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen.svg)](https://www.npmjs.com/package/eyecite-ts)
11
11
 
12
- TypeScript legal citation extraction library inspired by and extending Python [eyecite](https://github.com/freelawproject/eyecite).
12
+ TypeScript legal citation extraction — a port of Python [eyecite](https://github.com/freelawproject/eyecite) with extended capabilities.
13
13
 
14
- Extract, resolve, and annotate legal citations from court opinions and legal documents with zero runtime dependencies.
15
-
16
- ## Features
17
-
18
- - **Full citation extraction**: Case citations, statutes (20 jurisdictions), constitutional citations (U.S. + 50 states), journal articles, neutral citations, public laws, federal register
19
- - **Case name & full span**: Backward search extracts case names ("Smith v. Jones", "In re Smith"), `fullSpan` covers case name through closing parenthetical
20
- - **Parallel citation linking**: Automatic detection and grouping of comma-separated citations sharing a parenthetical (e.g., "410 U.S. 113, 93 S. Ct. 705 (1973)")
21
- - **Complex parentheticals**: Unified parser handles court+year, full dates (Jan. 15, 2020 / January 15, 2020 / 1/15/2020), disposition (en banc, per curiam), and chained parentheticals
22
- - **Short-form resolution**: Id./Ibid., supra, and short-form case citations resolved to their full antecedents
23
- - **Reporter database**: 1,200+ reporters with variant matching and confidence scoring
24
- - **Citation annotation**: HTML markup with auto-escape XSS protection and position tracking
25
- - **Bundle optimization**: Tree-shakeable exports, lazy-loaded reporter data, separate entry points
26
- - **TypeScript native**: Discriminated unions, conditional types, type guards, full IntelliSense
27
- - **Zero dependencies**: No runtime dependencies, ~10KB gzipped core bundle
14
+ Extract structured data from legal citations in court opinions, briefs, and legal documents. A citation like `500 F.2d 123 (9th Cir. 2020)` encodes a volume (500), reporter (Federal Reporter, 2nd Series), page (123), court (Ninth Circuit), and year. This library parses all of that into typed objects, resolves short-form references like "Id." back to their antecedents, and can annotate the original text with HTML markup. Zero runtime dependencies, browser-compatible, ~20 KB brotli.
28
15
 
29
16
  ## Installation
30
17
 
@@ -34,511 +21,246 @@ npm install eyecite-ts
34
21
 
35
22
  ## Quick Start
36
23
 
37
- ```typescript
38
- import { extractCitations } from 'eyecite-ts'
39
-
40
- const text = 'See Smith v. Jones, 500 F.2d 123 (9th Cir. Jan. 15, 2020)'
41
- const citations = extractCitations(text)
42
-
43
- console.log(citations[0])
44
- // {
45
- // type: 'case',
46
- // volume: 500,
47
- // reporter: 'F.2d',
48
- // page: 123,
49
- // court: '9th Cir.',
50
- // year: 2020,
51
- // caseName: 'Smith v. Jones',
52
- // date: { iso: '2020-01-15', parsed: { year: 2020, month: 1, day: 15 } },
53
- // confidence: 0.85,
54
- // span: { originalStart: 20, originalEnd: 33, ... },
55
- // fullSpan: { originalStart: 4, originalEnd: 57, ... }
56
- // }
57
- ```
58
-
59
- ## Citation Extraction
60
-
61
- ### Multiple Citation Types
62
-
63
- ```typescript
64
- import { extractCitations } from 'eyecite-ts'
65
-
66
- const text = `
67
- See Smith v. Jones, 500 F.2d 123 (9th Cir. 2020).
68
- Also 42 U.S.C. § 1983.
69
- Compare 123 Harv. L. Rev. 456.
70
- `
71
- const citations = extractCitations(text)
72
-
73
- citations.forEach(citation => {
74
- console.log(citation.type) // 'case', 'statute', 'journal', etc.
75
- })
76
- ```
77
-
78
- ### Statute Citations
79
-
80
- Extract citations from 20 state and federal jurisdictions with subsection, et seq., and jurisdiction identification:
24
+ A complete extract → resolve → annotate workflow:
81
25
 
82
26
  ```typescript
83
- import { extractCitations } from 'eyecite-ts'
84
-
85
- const text = `
86
- See 42 U.S.C. § 1983(a)(1) et seq.
87
- Also Cal. Penal Code § 187.
88
- And N.Y. Penal Law § 125.25(1)(a).
89
- Compare 735 ILCS 5/2-1001.
90
- `
91
- const citations = extractCitations(text)
92
-
93
- // Federal with subsections + et seq.
94
- // { type: 'statute', title: 42, code: 'U.S.C.', section: '1983',
95
- // subsection: '(a)(1)', jurisdiction: 'US', hasEtSeq: true, confidence: 1.0 }
96
-
97
- // California named-code
98
- // { type: 'statute', code: 'Penal', section: '187', jurisdiction: 'CA', confidence: 0.95 }
99
-
100
- // New York named-code with subsections
101
- // { type: 'statute', code: 'Penal Law', section: '125.25',
102
- // subsection: '(1)(a)', jurisdiction: 'NY', confidence: 1.0 }
103
-
104
- // Illinois chapter-act format
105
- // { type: 'statute', title: 735, code: '5', section: '2-1001',
106
- // jurisdiction: 'IL', confidence: 0.95 }
107
- ```
108
-
109
- **Supported jurisdictions:**
110
-
111
- | Family | Jurisdictions |
112
- |--------|--------------|
113
- | Federal | USC, CFR, prose ("section X of title Y") |
114
- | Named-code | NY (21 laws), CA (29 codes), TX (29 codes), MD (36 articles), VA, AL, MA |
115
- | Abbreviated-code | FL, OH, MI, UT, CO, WA, NC, GA, PA, IN, NJ, DE |
116
- | Chapter-act | IL (ILCS) |
117
-
118
- ### Constitutional Citations
119
-
120
- Extract U.S. and state constitutional citations with article, amendment, section, and clause parsing:
121
-
122
- ```typescript
123
- import { extractCitations } from 'eyecite-ts'
124
-
125
- const text = `
126
- Under U.S. Const. amend. XIV, § 1, equal protection is guaranteed.
127
- See also Cal. Const. art. I, § 7.
128
- And U.S. Const. art. I, § 8, cl. 3.
129
- `
130
- const citations = extractCitations(text)
131
-
132
- // U.S. amendment with section
133
- // { type: 'constitutional', jurisdiction: 'US', amendment: 14,
134
- // section: '1', confidence: 0.95 }
135
-
136
- // California article with section
137
- // { type: 'constitutional', jurisdiction: 'CA', article: 1,
138
- // section: '7', confidence: 0.9 }
139
-
140
- // Commerce Clause (article + section + clause)
141
- // { type: 'constitutional', jurisdiction: 'US', article: 1,
142
- // section: '8', clause: 3, confidence: 0.95 }
143
- ```
27
+ import { extractCitations } from "eyecite-ts"
28
+ import { annotate } from "eyecite-ts/annotate"
144
29
 
145
- Roman numerals (I–XXVII) are automatically parsed to integers. All 50 state abbreviations are supported.
30
+ const text = `In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020), the court
31
+ applied 42 U.S.C. § 1983. Id. at 130. See also 123 Harv. L. Rev. 456 (2019).`
146
32
 
147
- ### Async API
148
-
149
- ```typescript
150
- import { extractCitationsAsync } from 'eyecite-ts'
151
-
152
- const citations = await extractCitationsAsync(text)
153
- ```
154
-
155
- ### Custom Patterns
156
-
157
- ```typescript
158
- import { extractCitations } from 'eyecite-ts'
159
- import { casePatterns } from 'eyecite-ts'
160
-
161
- // Extract only case citations
162
- const citations = extractCitations(text, {
163
- patterns: casePatterns
164
- })
165
- ```
166
-
167
- ### Custom Cleaners
168
-
169
- ```typescript
170
- import { extractCitations, cleanText } from 'eyecite-ts'
171
-
172
- // Use only HTML stripping
173
- const citations = extractCitations(html, {
174
- cleaners: [(text) => text.replace(/<[^>]+>/g, '')]
175
- })
176
- ```
177
-
178
- ## Case Names & Full Spans
179
-
180
- Case citations can include the case name and full citation boundaries:
181
-
182
- ```typescript
183
- const text = 'In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc), the court held...'
184
- const citations = extractCitations(text)
33
+ // Step 1: Extract and resolve in one call
34
+ const citations = extractCitations(text, { resolve: true })
185
35
 
186
- if (citations[0].type === 'case') {
187
- console.log(citations[0].caseName) // 'Smith v. Jones'
188
- console.log(citations[0].disposition) // 'en banc'
189
- console.log(citations[0].fullSpan) // covers "Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc)"
190
- console.log(citations[0].span) // covers "500 F.2d 123" only (citation core)
36
+ // Step 2: Inspect results
37
+ for (const cite of citations) {
38
+ switch (cite.type) {
39
+ case "case":
40
+ console.log(cite.caseName, cite.reporter, cite.year)
41
+ // "Smith v. Jones" "F.2d" 2020
42
+ break
43
+ case "statute":
44
+ console.log(cite.title, cite.code, cite.section)
45
+ // 42 "U.S.C." "1983"
46
+ break
47
+ case "id":
48
+ console.log("Id. resolves to index", cite.resolution?.resolvedTo)
49
+ // Id. resolves to index 0
50
+ break
51
+ case "journal":
52
+ console.log(cite.journal, cite.volume, cite.page)
53
+ // "Harv. L. Rev." 123 456
54
+ break
55
+ }
191
56
  }
192
- ```
193
-
194
- Procedural prefixes are recognized automatically:
195
-
196
- ```typescript
197
- const text = 'In re Smith, 410 U.S. 113 (1973)'
198
- // caseName: 'In re Smith'
199
57
 
200
- const text2 = 'Ex parte Young, 209 U.S. 123 (1908)'
201
- // caseName: 'Ex parte Young'
58
+ // Step 3: Annotate the original text
59
+ const result = annotate(text, citations, {
60
+ template: { before: '<cite>', after: '</cite>' },
61
+ })
62
+ console.log(result.text)
202
63
  ```
203
64
 
204
- ### Structured Dates
205
-
206
- Parentheticals with full dates return structured date objects:
65
+ ## What It Extracts
207
66
 
208
- ```typescript
209
- const text = '500 F.3d 100 (2d Cir. Jan. 15, 2020)'
210
- // date: { iso: '2020-01-15', parsed: { year: 2020, month: 1, day: 15 } }
67
+ | Type | Example | Key Fields |
68
+ |------|---------|------------|
69
+ | `case` | `500 F.2d 123 (9th Cir. 2020)` | volume, reporter, page, court, year, caseName |
70
+ | `statute` | `42 U.S.C. § 1983(a)(1)` | title, code, section, subsection, jurisdiction |
71
+ | `constitutional` | `U.S. Const. amend. XIV, § 1` | jurisdiction, amendment, section, clause |
72
+ | `journal` | `123 Harv. L. Rev. 456` | volume, journal, page, year |
73
+ | `neutral` | `2020 WL 123456` | year, court, documentNumber |
74
+ | `publicLaw` | `Pub. L. No. 117-263` | congress, lawNumber |
75
+ | `federalRegister` | `87 Fed. Reg. 1234` | volume, page, year |
76
+ | `statutesAtLarge` | `136 Stat. 4459` | volume, page, year |
77
+ | `id` | `Id. at 125` | pincite |
78
+ | `supra` | `Smith, supra, at 130` | partyName, pincite |
79
+ | `shortFormCase` | `500 F.2d at 140` | volume, reporter, pincite |
211
80
 
212
- const text2 = '410 U.S. 113 (1973)'
213
- // date: { iso: '1973', parsed: { year: 1973 } }
214
- ```
81
+ Statute coverage spans 52 jurisdictions (50 states + DC + federal). See the [Advanced Extraction Guide](docs/guides/advanced-extraction.md) for jurisdiction details.
215
82
 
216
- Three date formats are supported: `Jan. 15, 2020`, `January 15, 2020`, and `1/15/2020`.
83
+ ## Key Features
217
84
 
218
- ### Blank Page Citations
85
+ ### Case Names & Full Spans
219
86
 
220
- Citations can reference blank pages using placeholder notation:
87
+ The library backward-searches for party names and tracks full citation boundaries:
221
88
 
222
89
  ```typescript
223
- const text = '500 F.2d ___ (2020)'
224
- const citations = extractCitations(text)
225
-
226
- if (citations[0].type === 'case') {
227
- console.log(citations[0].hasBlankPage) // true
228
- console.log(citations[0].page) // undefined
90
+ const text = "In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc), the court held..."
91
+ const [cite] = extractCitations(text)
92
+
93
+ if (cite.type === "case") {
94
+ cite.caseName // "Smith v. Jones"
95
+ cite.plaintiff // "Smith"
96
+ cite.defendant // "Jones"
97
+ cite.disposition // "en banc"
98
+ cite.span // covers "500 F.2d 123" (citation core)
99
+ cite.fullSpan // covers "Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc)"
229
100
  }
230
101
  ```
231
102
 
232
- Both `___` (triple underscore) and `---` (triple dash) are recognized as blank page placeholders. These appear in slip opinions or unpublished decisions where the final reporter page number is not yet available.
103
+ Procedural prefixes like `In re`, `Ex parte`, and `Matter of` are recognized automatically.
233
104
 
234
- ## Parallel Citations
105
+ ### Parallel Citations
235
106
 
236
- When multiple case citations share the same parenthetical, they represent parallel citations for the same case in different reporters. The library automatically detects and groups them:
107
+ When multiple reporters cite the same case (common in older Supreme Court opinions), the library groups them automatically:
237
108
 
238
109
  ```typescript
239
- const text = 'See 410 U.S. 113, 93 S. Ct. 705, 35 L. Ed. 2d 147 (1973).'
110
+ const text = "See 410 U.S. 113, 93 S. Ct. 705, 35 L. Ed. 2d 147 (1973)."
240
111
  const citations = extractCitations(text)
241
112
 
242
- // Returns 3 citations, all linked by groupId
243
- console.log(citations[0].groupId) // "410-U.S.-113"
244
- console.log(citations[1].groupId) // "410-U.S.-113" (same group)
245
- console.log(citations[2].groupId) // "410-U.S.-113" (same group)
246
-
247
- // Primary citation (first in group) has parallelCitations array
248
- if (citations[0].type === 'case') {
249
- console.log(citations[0].parallelCitations)
250
- // [
251
- // { volume: 93, reporter: 'S. Ct.', page: 705 },
252
- // { volume: 35, reporter: 'L. Ed. 2d', page: 147 }
253
- // ]
254
- }
113
+ citations[0].groupId // "410-U.S.-113"
114
+ citations[1].groupId // "410-U.S.-113" (same group)
115
+ citations[2].groupId // "410-U.S.-113" (same group)
255
116
 
256
- // Secondary citations don't duplicate the array
257
- console.log(citations[1].parallelCitations) // undefined
258
- console.log(citations[2].parallelCitations) // undefined
117
+ // Primary citation carries the linked array
118
+ if (citations[0].type === "case") {
119
+ citations[0].parallelCitations
120
+ // [{ volume: 93, reporter: 'S. Ct.', page: 705 },
121
+ // { volume: 35, reporter: 'L. Ed. 2d', page: 147 }]
122
+ }
259
123
  ```
260
124
 
261
- **Key points:**
262
- - All citations in a parallel group share the same `groupId`
263
- - Only the **first citation** (primary) has the `parallelCitations` array
264
- - Secondary citations remain in the results array for individual processing
265
- - Group ID format: `${volume}-${reporter}-${page}` (e.g., "410-U.S.-113")
266
-
267
- Use `groupId` to identify which citations refer to the same case, or access `parallelCitations` on the primary to get all reporters at once.
268
-
269
- ## Resolving Short-Form Citations
270
-
271
- Short-form citations (Id., supra, short-form case) refer to earlier citations in the document. The resolution engine links them to their full antecedents.
125
+ ### Short-Form Resolution
272
126
 
273
- ### Convenience API
127
+ Pass `{ resolve: true }` to link Id., supra, and short-form case citations to their full antecedents:
274
128
 
275
129
  ```typescript
276
- import { extractCitations } from 'eyecite-ts'
277
-
278
- const text = `
279
- Smith v. Jones, 500 F.2d 123 (2020).
280
- Id. at 125.
281
- Smith, supra, at 130.
282
- 500 F.2d at 140.
283
- `
284
-
130
+ const text = `Smith v. Jones, 500 F.2d 123 (2020). Id. at 125.`
285
131
  const citations = extractCitations(text, { resolve: true })
286
132
 
287
- // citations[1] is Id. citation
288
- console.log(citations[1].resolution)
133
+ citations[1].resolution
289
134
  // { resolvedTo: 0, confidence: 1.0 }
290
135
  ```
291
136
 
292
- ### Power-User API
293
-
294
- ```typescript
295
- import { extractCitations, resolveCitations } from 'eyecite-ts'
296
-
297
- const citations = extractCitations(text)
298
-
299
- const resolved = resolveCitations(citations, text, {
300
- scopeStrategy: 'paragraph',
301
- fuzzyPartyMatching: true,
302
- partyMatchThreshold: 0.8,
303
- reportUnresolved: true
304
- })
305
- ```
306
-
307
- ### Resolution Options
308
-
309
- | Option | Type | Default | Description |
310
- |--------|------|---------|-------------|
311
- | `scopeStrategy` | `'paragraph'` \| `'section'` \| `'footnote'` \| `'none'` | `'paragraph'` | How far back to search for antecedents |
312
- | `autoDetectParagraphs` | `boolean` | `true` | Auto-detect paragraph boundaries from text |
313
- | `paragraphBoundaryPattern` | `RegExp` | `/\n\n+/` | Pattern to detect paragraphs |
314
- | `fuzzyPartyMatching` | `boolean` | `true` | Enable fuzzy party name matching for supra |
315
- | `partyMatchThreshold` | `number` | `0.8` | Similarity threshold (0-1) for fuzzy matching |
316
- | `reportUnresolved` | `boolean` | `true` | Report failure reasons for unresolved citations |
317
-
318
- ### Resolution Examples
319
-
320
- **Id. citations:**
321
-
322
- ```typescript
323
- const text = 'Smith v. Jones, 500 F.2d 123. Id. at 125.'
324
- const citations = extractCitations(text, { resolve: true })
325
- // citations[1].resolution.resolvedTo === 0
326
- ```
327
-
328
- **Supra citations:**
329
-
330
- ```typescript
331
- const text = 'Smith v. Jones, 500 F.2d 123. Smith, supra, at 130.'
332
- const citations = extractCitations(text, { resolve: true })
333
- // citations[1].resolution.resolvedTo === 0 (party name matches "Smith")
334
- ```
335
-
336
- **Short-form case citations:**
137
+ The resolver supports paragraph/section/footnote scope boundaries, fuzzy party name matching, and configurable thresholds. See the [Resolution Guide](docs/guides/resolution.md) for the power-user API.
337
138
 
338
- ```typescript
339
- const text = 'Brown v. Board, 347 U.S. 483. See 347 U.S. at 495.'
340
- const citations = extractCitations(text, { resolve: true })
341
- // citations[1].resolution.resolvedTo === 0 (volume/reporter matches)
342
- ```
139
+ ### Citation Annotation
343
140
 
344
- **Unresolved citations:**
141
+ Mark up citations with HTML using template or callback modes:
345
142
 
346
143
  ```typescript
347
- const text = 'Id. at 100.' // Orphan Id. with no preceding citation
348
- const citations = extractCitations(text, { resolve: true })
349
- // citations[0].resolution.failureReason === 'No preceding citation found'
350
- ```
351
-
352
- ## Citation Annotation
144
+ import { annotate } from "eyecite-ts/annotate"
353
145
 
354
- Add HTML markup to citations in text:
355
-
356
- ```typescript
357
- import { annotate } from 'eyecite-ts/annotate'
358
- import { extractCitations } from 'eyecite-ts'
359
-
360
- const text = 'See Smith v. Jones, 500 F.2d 123 (2020).'
361
- const citations = extractCitations(text)
362
-
363
- // Template mode
364
- const result = annotate(text, citations, {
365
- template: { before: '<cite>', after: '</cite>' }
366
- })
367
- // result.text === 'See Smith v. Jones, <cite>500 F.2d 123</cite> (2020).'
368
-
369
- // Callback mode (full control)
370
- const result2 = annotate(text, citations, {
371
- callback: (citation, surrounding) => {
372
- if (citation.type === 'case') {
373
- return `<a href="/cases/${citation.volume}">${citation.matchedText}</a>`
374
- }
375
- return citation.matchedText
376
- }
377
- })
378
- ```
379
-
380
- Auto-escape is enabled by default for XSS protection:
381
-
382
- ```typescript
383
146
  const result = annotate(text, citations, {
384
147
  template: { before: '<cite>', after: '</cite>' },
385
- autoEscape: true // default — escapes &, <, >, ", ', /
386
148
  })
149
+ // "See Smith v. Jones, <cite>500 F.2d 123</cite> (2020)."
387
150
  ```
388
151
 
389
- ### Annotating Full Spans
152
+ XSS auto-escape is enabled by default. Use `useFullSpan: true` to annotate from case name through closing parenthetical. See the [Annotation Guide](docs/guides/annotation.md) for callback mode and full options.
390
153
 
391
- By default, annotation wraps only the citation core (volume-reporter-page). Use `useFullSpan` to annotate from the case name through the closing parenthetical:
154
+ ### Confidence & Signals
392
155
 
393
- ```typescript
394
- const text = 'In Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc), the court held...'
395
- const citations = extractCitations(text)
156
+ Each citation carries a `confidence` score (0-1) based on pattern match quality and reporter validation. Citations preceded by legal signals are tagged:
396
157
 
397
- // Default: annotates only "500 F.2d 123"
398
- const coreOnly = annotate(text, citations, {
399
- template: { before: '<cite>', after: '</cite>' }
400
- })
401
- // Result: "In Smith v. Jones, <cite>500 F.2d 123</cite> (9th Cir. 2020) (en banc), the court held..."
158
+ ```typescript
159
+ const text = "See also Smith v. Jones, 500 F.2d 123 (2020)."
160
+ const [cite] = extractCitations(text)
402
161
 
403
- // With useFullSpan: annotates "Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc)"
404
- const fullSpan = annotate(text, citations, {
405
- template: { before: '<cite>', after: '</cite>' },
406
- useFullSpan: true
407
- })
408
- // Result: "In <cite>Smith v. Jones, 500 F.2d 123 (9th Cir. 2020) (en banc)</cite>, the court held..."
162
+ cite.confidence // 0.85
163
+ cite.signal // "see also"
409
164
  ```
410
165
 
411
- Full span annotation covers:
412
- - Case name (if present)
413
- - Volume-reporter-page
414
- - Court and date parenthetical
415
- - Disposition parenthetical (en banc, per curiam)
416
- - Chained parentheticals
417
- - Subsequent history
418
-
419
- Use `useFullSpan: true` when you want to highlight the entire citation as a unit, or `useFullSpan: false` (default) to annotate only the citation core for minimal markup.
166
+ ### Footnote Detection
420
167
 
421
- ## Reporter Validation
422
-
423
- Validate case citations against the reporters database:
168
+ Opt-in feature that tags citations with their footnote context and enables zone-scoped resolution:
424
169
 
425
170
  ```typescript
426
- import { extractWithValidation } from 'eyecite-ts'
171
+ const citations = extractCitations(text, { detectFootnotes: true })
427
172
 
428
- const validated = await extractWithValidation(text, { validate: true })
429
- // Confidence adjustments:
430
- // +0.2 boost for reporter match
431
- // -0.3 penalty for unknown reporter
432
- // -0.1 per extra match for ambiguous reporter
173
+ for (const cite of citations) {
174
+ if (cite.inFootnote) {
175
+ console.log(`Footnote ${cite.footnoteNumber}: ${cite.matchedText}`)
176
+ }
177
+ }
433
178
  ```
434
179
 
435
- ## Type System
436
-
437
- All citation types use a discriminated union on the `type` field:
180
+ Supports HTML footnote tags and plaintext footnote sections (separator + numbered markers). See the [Footnote Detection Guide](docs/guides/footnote-detection.md).
438
181
 
439
- ```typescript
440
- import type {
441
- Citation, // Union of all 11 types
442
- FullCaseCitation,
443
- StatuteCitation,
444
- ConstitutionalCitation,
445
- JournalCitation,
446
- NeutralCitation,
447
- PublicLawCitation,
448
- FederalRegisterCitation,
449
- IdCitation,
450
- SupraCitation,
451
- ShortFormCaseCitation,
452
- CitationOfType, // Extract subtype: CitationOfType<'case'> = FullCaseCitation
453
- ExtractorMap, // Maps FullCitationType keys to citation subtypes
454
- FullCitation, // Union of full citation types
455
- ShortFormCitation, // Union of short-form types
456
- } from 'eyecite-ts'
457
- ```
182
+ ## Type System
458
183
 
459
- ### Type Guards
184
+ All citation types use a [discriminated union](https://www.typescriptlang.org/docs/handbook/2/narrowing.html#discriminated-unions) on the `type` field:
460
185
 
461
186
  ```typescript
462
- import {
463
- isFullCitation,
464
- isShortFormCitation,
465
- isCaseCitation,
466
- isCitationType,
467
- assertUnreachable
468
- } from 'eyecite-ts'
469
-
470
- // Specific guards
471
- if (isFullCitation(citation)) {
472
- // citation: FullCitation
473
- }
187
+ import type { Citation, FullCaseCitation, StatuteCitation } from "eyecite-ts"
188
+ import { isFullCitation, isCaseCitation, assertUnreachable } from "eyecite-ts"
474
189
 
475
- // Generic guard — narrows to any specific type
476
- if (isCitationType(citation, 'statute')) {
477
- // citation: StatuteCitation
190
+ // Type guards
191
+ if (isCaseCitation(citation)) {
192
+ citation.reporter // typed as string
478
193
  }
479
194
 
480
- // Exhaustiveness check in switch statements
195
+ // Exhaustive switch
481
196
  switch (citation.type) {
482
- case 'case': /* ... */ break
483
- case 'statute': /* ... */ break
484
- // ... all 11 types ...
197
+ case "case": /* ... */ break
198
+ case "statute": /* ... */ break
199
+ // ... all 11 types
485
200
  default: assertUnreachable(citation.type)
486
201
  }
487
202
  ```
488
203
 
489
- ### Resolved Citation Types
204
+ `CitationOfType<'case'>` extracts the subtype: `CitationOfType<'case'>` = `FullCaseCitation`. See the [Type Reference](docs/api/types.md) for the full catalog.
490
205
 
491
- `ResolvedCitation` uses a conditional type — `resolution` is only meaningfully present on short-form citations:
206
+ ## Bundle Size
492
207
 
493
- ```typescript
494
- import type { ResolvedCitation } from 'eyecite-ts'
208
+ Three entry points for tree-shaking:
495
209
 
496
- // On short-form citations: resolution: ResolutionResult | undefined
497
- // On full citations: resolution?: undefined
498
- ```
210
+ | Entry Point | Import | Size (brotli) |
211
+ |-------------|--------|---------------|
212
+ | Core extraction | `eyecite-ts` | ~20 KB |
213
+ | Annotation | `eyecite-ts/annotate` | ~1.3 KB |
214
+ | Reporter data | `eyecite-ts/data` | lazy-loaded |
499
215
 
500
- ## Bundle Size
216
+ Import only what you need — the reporter database is loaded on first use, not at import time.
501
217
 
502
- Three entry points for optimal tree-shaking:
218
+ ## Comparison with Python eyecite
503
219
 
504
- | Entry Point | Import | Gzipped |
505
- |------------|--------|---------|
506
- | Core extraction | `eyecite-ts` | ~10 KB |
507
- | Annotation | `eyecite-ts/annotate` | 0.7 KB |
508
- | Reporter data | `eyecite-ts/data` | 86.5 KB (lazy-loaded) |
220
+ Every claim verified against [Python eyecite](https://github.com/freelawproject/eyecite) source code (April 2026).
509
221
 
510
- ```typescript
511
- import { extractCitations } from 'eyecite-ts' // Core only
512
- import { annotate } from 'eyecite-ts/annotate' // Annotation
513
- import { loadReporters } from 'eyecite-ts/data' // Reporter database
514
- ```
222
+ | Capability | Python eyecite | eyecite-ts | Notes |
223
+ |---|---|---|---|
224
+ | Case citations | Yes | Yes | Both extract volume/reporter/page/court/year |
225
+ | Statute citations | Yes (all 50 states + DC + territories) | Yes (50 states + DC + federal) | Python uses `reporters-db`; TS uses built-in patterns |
226
+ | Constitutional citations | No | Yes (U.S. + 50 states) | Dedicated type with article/amendment/section/clause |
227
+ | Journal / law review | Yes | Yes | |
228
+ | Neutral (WL/LEXIS) | Yes (as case citations) | Yes (dedicated type) | |
229
+ | Short-form resolution | Yes | Yes | |
230
+ | Case name extraction | Yes | Yes | Both use backward scanning |
231
+ | Parallel citation linking | Partial (detection + metadata copy) | Yes (`groupId` + `parallelCitations`) | |
232
+ | Full span tracking | Yes | Yes | TS carries dual clean/original positions |
233
+ | Component spans | Minimal (pin cite only) | Yes (all fields) | |
234
+ | Footnote detection | No | Yes | HTML + plaintext strategies |
235
+ | Citation signals | No (stop words only) | Yes (extracted as metadata) | |
236
+ | Annotation | Yes (HTML modes) | Yes (template/callback + XSS auto-escape) | |
237
+ | Position mapping | Yes (diff-based) | Yes (incremental TransformationMap) | |
238
+ | Type system | Class inheritance | Discriminated union | TS enables exhaustive switch |
515
239
 
516
- ## Architecture
240
+ eyecite-ts started as a port and has diverged. Both are capable citation extractors — eyecite-ts adds constitutional citations, footnote detection, citation signals, rich component spans, and a TypeScript-native type system, while Python eyecite has broader statute coverage via `reporters-db` and a mature ecosystem.
517
241
 
518
- Citation extraction follows a 4-stage pipeline:
242
+ Coming from Python eyecite? See the [Migration Guide](docs/guides/migration-from-python.md).
519
243
 
520
- 1. **Clean**: Remove HTML, normalize Unicode, fix smart quotes
521
- 2. **Tokenize**: Apply regex patterns to find citation candidates
522
- 3. **Extract**: Parse metadata (volume, reporter, page, etc.)
523
- 4. **Resolve** (optional): Link short-form citations to antecedents
244
+ ## Architecture
524
245
 
525
- All positions (spans) track both cleaned and original text offsets via `TransformationMap`.
246
+ Citations flow through a 4-stage pipeline: **clean → tokenize → extract → resolve**. Text cleaning builds a `TransformationMap` that tracks position shifts, so every citation carries dual coordinates (cleaned and original text). Resolution is optional and runs as a final pass.
526
247
 
527
248
  See [ARCHITECTURE.md](ARCHITECTURE.md) for details.
528
249
 
529
250
  ## Development
530
251
 
531
252
  ```bash
532
- pnpm install # Install dependencies
533
- pnpm test # Run tests (vitest, watch mode)
534
- pnpm exec vitest run # Run tests once
535
- pnpm typecheck # Type-check with tsc
536
- pnpm build # Build (ESM + CJS + DTS)
537
- pnpm lint # Lint with Biome
538
- pnpm format # Format with Biome
253
+ pnpm install # Install dependencies (corepack, pnpm 10)
254
+ pnpm test # Run tests (vitest, watch mode)
255
+ pnpm exec vitest run # Run tests once (1,748 tests, 72 files)
256
+ pnpm typecheck # Type-check with tsc
257
+ pnpm build # Build (ESM + CJS + DTS)
258
+ pnpm lint # Lint with Biome
259
+ pnpm format # Format with Biome
260
+ pnpm size # Check bundle size limits
539
261
  ```
540
262
 
541
- 1030+ tests across 34 test files.
263
+ Requires Node.js >= 18.0.0. See [ARCHITECTURE.md](ARCHITECTURE.md) for contributor orientation.
542
264
 
543
265
  ## License
544
266
 
@@ -546,4 +268,4 @@ MIT
546
268
 
547
269
  ## Credits
548
270
 
549
- Inspired by [eyecite](https://github.com/freelawproject/eyecite) (Python) by Free Law Project. This TypeScript implementation adds parallel citation linking, party name extraction, full span tracking, and performance optimizations while maintaining compatibility with the original API design.
271
+ Inspired by and ported from [eyecite](https://github.com/freelawproject/eyecite) (Python) by [Free Law Project](https://free.law/). This TypeScript implementation extends the original with constitutional citations, footnote detection, citation signals, parallel citation grouping, component spans, and a discriminated-union type system.