@lexbuild/core 1.9.4 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +103 -54
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,93 +1,142 @@
1
1
  # @lexbuild/core
2
2
 
3
3
  [![npm](https://img.shields.io/npm/v/%40lexbuild%2Fcore?style=for-the-badge)](https://www.npmjs.com/package/@lexbuild/core)
4
- [![license](https://img.shields.io/github/license/chris-c-thomas/LexBuild?style=for-the-badge)](https://github.com/chris-c-thomas/LexBuild)
4
+ [![license](https://img.shields.io/github/license/chris-c-thomas/LexBuild?style=for-the-badge)](https://github.com/chris-c-thomas/LexBuild/blob/main/LICENSE)
5
5
 
6
- This package is part of the [LexBuild](https://github.com/chris-c-thomas/LexBuild) monorepo, a tool that converts U.S. legal XML into structured Markdown optimized for AI, RAG pipelines, and semantic search. See the monorepo for full documentation, architecture details, and contribution guidelines.
6
+ Shared infrastructure for the [LexBuild](https://github.com/chris-c-thomas/LexBuild) legal-XML-to-Markdown pipeline. Provides streaming XML parsing, AST definitions, Markdown rendering, YAML frontmatter generation, and cross-reference link resolution used by all source packages.
7
7
 
8
- It provides the foundational building blocks for XML parsing infrastructure, AST definitions, and Markdown rendering for use by all source packages ([`@lexbuild/usc`](https://www.npmjs.com/package/@lexbuild/usc), [`@lexbuild/ecfr`](https://www.npmjs.com/package/@lexbuild/ecfr)) and [`@lexbuild/cli`](https://www.npmjs.com/package/@lexbuild/cli).
8
+ > **Note:** This is a foundational library. Most users should install [`@lexbuild/cli`](https://www.npmjs.com/package/@lexbuild/cli) for the command-line tool, or a source package ([`@lexbuild/usc`](https://www.npmjs.com/package/@lexbuild/usc), [`@lexbuild/ecfr`](https://www.npmjs.com/package/@lexbuild/ecfr)) for programmatic access.
9
9
 
10
10
  ## Install
11
11
 
12
12
  ```bash
13
13
  npm install @lexbuild/core
14
+ # or
15
+ pnpm add @lexbuild/core
14
16
  ```
15
17
 
16
- ## What's Included
17
-
18
- ### XML Parser
19
-
20
- Streaming SAX parser with namespace normalization. Supports USLM (U.S. Code) and namespace-free XML (eCFR) via the `defaultNamespace` option.
18
+ ## Quick Start
21
19
 
22
20
  ```ts
23
- import { XMLParser } from "@lexbuild/core";
21
+ import { XMLParser, ASTBuilder, renderDocument, generateFrontmatter, createLinkResolver } from "@lexbuild/core";
22
+ import { createReadStream } from "node:fs";
24
23
 
24
+ // 1. Parse XML via streaming SAX
25
25
  const parser = new XMLParser();
26
- parser.on("openElement", (name, attrs) => { /* ... */ });
27
- parser.on("closeElement", (name) => { /* ... */ });
28
- parser.on("text", (text) => { /* ... */ });
29
-
30
- await parser.parseStream(readableStream);
31
- ```
32
-
33
- ### USLM AST Builder
34
-
35
- Stack-based XML-to-AST construction with a section-emit pattern for bounded memory usage. This builder handles USLM 1.0 XML (U.S. Code). Source packages for other formats (e.g., `@lexbuild/ecfr`) provide their own builders.
36
-
37
- ```ts
38
- import { ASTBuilder } from "@lexbuild/core";
39
-
40
26
  const builder = new ASTBuilder({
41
27
  emitAt: "section",
42
28
  onEmit: (node, context) => {
43
- // Called with each completed section subtree
29
+ // 2. Each completed section is emitted here
30
+ const frontmatter = generateFrontmatter(/* ... */);
31
+ const resolver = createLinkResolver("relative");
32
+ const markdown = renderDocument(node, frontmatter, {
33
+ linkStyle: "relative",
34
+ resolveLink: resolver.resolve,
35
+ });
36
+ // 3. Write markdown to file
44
37
  },
45
38
  });
46
- ```
47
39
 
48
- ### Markdown Renderer
40
+ parser.on("openElement", (name, attrs) => builder.onOpenElement(name, attrs));
41
+ parser.on("closeElement", (name) => builder.onCloseElement(name));
42
+ parser.on("text", (text) => builder.onText(text));
49
43
 
50
- Stateless AST-to-Markdown conversion with YAML frontmatter, cross-reference link resolution, and notes filtering.
44
+ await parser.parseStream(createReadStream("usc01.xml"));
45
+ ```
51
46
 
52
- ```ts
53
- import { renderDocument, generateFrontmatter, createLinkResolver } from "@lexbuild/core";
47
+ ## API Reference
54
48
 
55
- const markdown = renderDocument(sectionNode, frontmatterData, {
56
- linkStyle: "relative",
57
- resolveLink: resolver.resolve,
58
- });
59
- ```
49
+ ### XML Parsing
60
50
 
61
- ### AST Node Types
51
+ | Export | Description |
52
+ |--------|-------------|
53
+ | `XMLParser` | Streaming SAX parser wrapping `saxes` with namespace normalization. Supports USLM (namespaced) and namespace-free XML (eCFR) via the `defaultNamespace` option. |
62
54
 
63
- Full type definitions for the legal document AST: `LevelNode`, `ContentNode`, `InlineNode`, `NoteNode`, `TableNode`, `SourceType`, `LegalStatus`, and more.
55
+ ### AST Builder
64
56
 
65
- ```ts
66
- import type { ASTNode, LevelNode, FrontmatterData, SourceType, LegalStatus } from "@lexbuild/core";
67
- ```
57
+ | Export | Description |
58
+ |--------|-------------|
59
+ | `ASTBuilder` | Stack-based USLM XML-to-AST builder with configurable emit-at-level streaming. Handles the full USLM 1.0 element vocabulary. Source packages for other formats provide their own builders. |
68
60
 
69
- ### Namespace Constants
61
+ ### Rendering
70
62
 
71
- USLM, XHTML, Dublin Core namespace URIs and element classification sets.
63
+ | Export | Description |
64
+ |--------|-------------|
65
+ | `renderDocument()` | Render a section node with frontmatter to a complete Markdown file |
66
+ | `renderSection()` | Render a section-level node to Markdown body text |
67
+ | `renderNode()` | Render any AST node to Markdown |
68
+ | `generateFrontmatter()` | Generate a YAML frontmatter block from `FrontmatterData` |
69
+ | `createLinkResolver()` | Create a cross-reference link resolver supporting USC, CFR, and fallback URLs |
72
70
 
73
- ```ts
74
- import { USLM_NS, XHTML_NS, LEVEL_ELEMENTS, CONTENT_ELEMENTS } from "@lexbuild/core";
75
- ```
71
+ ### Types
76
72
 
77
- ## API Reference
73
+ | Export | Description |
74
+ |--------|-------------|
75
+ | `ASTNode` | Union type for all AST nodes |
76
+ | `LevelNode` | Hierarchical structural node (title, chapter, section, etc.) |
77
+ | `ContentNode` | Text content block (content, chapeau, continuation, proviso) |
78
+ | `InlineNode` | Inline text formatting (bold, italic, ref, footnoteRef, etc.) |
79
+ | `NoteNode` | Note block (editorial, statutory, amendment, etc.) |
80
+ | `TableNode` | Table with headers and rows |
81
+ | `SourceCreditNode` | Enactment source citation |
82
+ | `FrontmatterData` | Full frontmatter field definitions |
83
+ | `EmitContext` | Context passed with emitted nodes (ancestors, document metadata) |
84
+ | `SourceType` | `"usc" \| "ecfr"` |
85
+ | `LegalStatus` | `"official_legal_evidence" \| "official_prima_facie" \| "authoritative_unofficial"` |
86
+
87
+ ### Constants
78
88
 
79
89
  | Export | Description |
80
90
  |--------|-------------|
81
- | `XMLParser` | Streaming SAX parser with namespace normalization |
82
- | `ASTBuilder` / `UslmASTBuilder` | USLM XML events to AST with section-emit pattern |
83
- | `renderDocument()` | Render a section node with frontmatter to Markdown |
84
- | `renderSection()` | Render a section-level node to Markdown |
85
- | `renderNode()` | Render any AST node to Markdown |
86
- | `generateFrontmatter()` | Generate YAML frontmatter block |
87
- | `createLinkResolver()` | Create a cross-reference link resolver |
88
- | `parseIdentifier()` | Parse a USC or CFR identifier into components |
89
91
  | `FORMAT_VERSION` | Output format version (`"1.1.0"`) |
90
- | `GENERATOR` | Generator string for frontmatter |
92
+ | `GENERATOR` | Generator string for frontmatter metadata |
93
+ | `LEVEL_TYPES` | Ordered array of level types (title → subsubitem) |
94
+ | `BIG_LEVELS` | Set of structural levels above section |
95
+ | `USLM_NS` | USLM namespace URI |
96
+ | `XHTML_NS` | XHTML namespace URI |
97
+
98
+ ### File System Utilities
99
+
100
+ | Export | Description |
101
+ |--------|-------------|
102
+ | `writeFile()` | Write with ENFILE/EMFILE retry and exponential backoff |
103
+ | `mkdir()` | Recursive mkdir with retry |
104
+
105
+ ## Compatibility
106
+
107
+ - **Node.js** >= 22
108
+ - **ESM only** — no CommonJS build
109
+ - **TypeScript** — ships `.d.ts` type declarations
110
+ - **Zero browser dependencies** — Node.js runtime only
111
+
112
+ ## Monorepo Context
113
+
114
+ This package is part of the [LexBuild](https://github.com/chris-c-thomas/LexBuild) monorepo, managed with [Turborepo](https://turbo.build/) and [pnpm workspaces](https://pnpm.io/workspaces). All packages use [changesets](https://github.com/changesets/changesets) for lockstep versioning.
115
+
116
+ ```
117
+ packages/
118
+ ├── core/ ← you are here
119
+ ├── usc/ # depends on core
120
+ ├── ecfr/ # depends on core
121
+ └── cli/ # depends on core, usc, ecfr
122
+ ```
123
+
124
+ ### Development
125
+
126
+ ```bash
127
+ pnpm turbo build --filter=@lexbuild/core # Build
128
+ pnpm turbo test --filter=@lexbuild/core # Run tests
129
+ pnpm turbo typecheck --filter=@lexbuild/core
130
+ pnpm turbo lint --filter=@lexbuild/core
131
+ ```
132
+
133
+ ## Related Packages
134
+
135
+ | Package | Description |
136
+ |---------|-------------|
137
+ | [`@lexbuild/cli`](https://www.npmjs.com/package/@lexbuild/cli) | CLI tool for downloading and converting legal XML |
138
+ | [`@lexbuild/usc`](https://www.npmjs.com/package/@lexbuild/usc) | U.S. Code (USLM XML) converter and downloader |
139
+ | [`@lexbuild/ecfr`](https://www.npmjs.com/package/@lexbuild/ecfr) | eCFR (Code of Federal Regulations) converter and downloader |
91
140
 
92
141
  ## License
93
142
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@lexbuild/core",
3
- "version": "1.9.4",
3
+ "version": "1.10.0",
4
4
  "description": "Core AST definitions, parsing infrastructure, and format-agnostic renderers for the LexBuild ecosystem.",
5
5
  "author": "Chris Thomas",
6
6
  "license": "MIT",