@leadertechie/md2html 0.1.0-alpha.2 → 0.1.0-alpha.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,13 +10,21 @@ A configuration-driven markdown to HTML pipeline that parses markdown to an AST
10
10
  - **Configuration-driven** - No hardcoded paths or content structure
11
11
  - **SSR-ready** - Works in both Node.js and browser environments
12
12
  - **Image path handling** - Configurable prefix and base URL for images
13
+ - **Strategy pattern token handlers** - Extensible handler registry with per-token-type strategies
14
+ - **Catch-all fallback** - Unhandled token types are wrapped in container nodes with `data-unhandled` attributes
15
+ - **CSS `@scope` anchors** - Emit `data-md-scope` attributes for CSS `@scope` targeting
16
+ - **Raw HTML passthrough** - Preserve allowed HTML tags (div, span, img, etc.) with script stripping by default
17
+ - **Slot hooks** - Resolve `[[SLOT_NAME]]` placeholders via callback for personalization
18
+ - **Graceful error recovery** - Configurable `'throw' | 'warn' | 'silent'` error handling modes
13
19
 
14
20
  ## Installation
15
21
 
16
22
  ```bash
17
- npm install @leadertechie/md2html
23
+ npm install @leadertechie/md2html lit
18
24
  ```
19
25
 
26
+ > Note: `lit` is a peer dependency and required for rendering Lit templates.
27
+
20
28
  ## Usage
21
29
 
22
30
  ### Basic Usage
@@ -55,10 +63,119 @@ const pipeline = new MarkdownPipeline({
55
63
  gfm: true,
56
64
  breaks: false,
57
65
  pedantic: false
66
+ },
67
+ styleOptions: {
68
+ classPrefix: 'md-',
69
+ customCSS: 'body { font-family: system-ui; }',
70
+ addHeadingIds: true,
71
+ emitScopeAnchors: true // v2: emit data-md-scope attributes
72
+ },
73
+ preserveRawHTML: true, // v2: pass through allowed HTML tags
74
+ errorRecovery: 'warn', // v2: graceful error handling
75
+ onSlot: (name) => `[${name}]` // v2: resolve [[SLOT_NAME]] placeholders
76
+ });
77
+ ```
78
+
79
+ ### Style Configuration Options
80
+
81
+ | Option | Type | Default | Description |
82
+ |--------|------|---------|-------------|
83
+ | `classPrefix` | string | `''` | Prefix for CSS classes on elements |
84
+ | `customCSS` | string | `''` | Custom CSS string to inject (use `pipeline.getCustomCSS()` to retrieve) |
85
+ | `addHeadingIds` | boolean | `false` | Add ID attributes to headings based on their content for anchor links |
86
+ | `emitScopeAnchors` | boolean | `false` | Emit `data-md-scope` attributes for CSS `@scope` targeting (v2) |
87
+
88
+ When `classPrefix` or `addHeadingIds` is set, CSS classes will be added to elements:
89
+ - Headings get level-specific classes: `md-h1`, `md-h2`, `md-h3`, etc.
90
+ - Other elements: `paragraph`, `list`, `list-item`, `image`, `code`, `container`, `blockquote`
91
+
92
+ Example output with `classPrefix: 'md-'` and `addHeadingIds: true`:
93
+ ```html
94
+ <h1 id="hello-world" class="md-h1">Hello World</h1>
95
+ <h2 id="subheading" class="md-h2">Subheading</h2>
96
+ <p class="md-paragraph">This is a paragraph.</p>
97
+ <ul class="md-list">
98
+ <li class="md-list-item">Item 1</li>
99
+ </ul>
100
+ ```
101
+
102
+ ### CSS `@scope` Anchors (v2)
103
+
104
+ When `emitScopeAnchors: true`, every rendered element gets a `data-md-scope` attribute:
105
+
106
+ ```html
107
+ <div data-md-scope="root">
108
+ <h2 data-md-scope="heading" class="md-heading">Title</h2>
109
+ <p data-md-scope="paragraph" class="md-paragraph">Content</p>
110
+ </div>
111
+ ```
112
+
113
+ This enables CSS `@scope` targeting in your stylesheets:
114
+
115
+ ```css
116
+ @layer components {
117
+ @scope ([data-md-scope="root"]) {
118
+ :scope { max-width: 700px; }
119
+ [data-md-scope="heading"] { font-size: clamp(1.5rem, 4vw, 2.5rem); }
120
+ }
121
+ }
122
+ ```
123
+
124
+ ### Raw HTML Passthrough (v2)
125
+
126
+ When `preserveRawHTML: true`, allowed HTML tags pass through the parser:
127
+
128
+ ```typescript
129
+ const pipeline = new MarkdownPipeline({ preserveRawHTML: true });
130
+ const html = pipeline.renderMarkdown('Hello <div class="test">World</div>');
131
+ // Output preserves the <div> with its attributes
132
+ ```
133
+
134
+ **Default allowed tags:** `img`, `style`, `div`, `span`, `section`, `article`, `aside`, `header`, `footer`, `nav`, `main`, `figure`, `figcaption`, `details`, `summary`, `mark`, `time`, `video`, `audio`, `source`, `iframe`, `embed`
135
+
136
+ **Script tags** are stripped by default for security. Opt-in with `allowedHTMLTags: ['script']`.
137
+
138
+ ### Slot Hooks (v2)
139
+
140
+ Resolve `[[SLOT_NAME]]` placeholders for personalization:
141
+
142
+ ```typescript
143
+ const pipeline = new MarkdownPipeline({
144
+ onSlot: (name) => {
145
+ const values = { USER_NAME: 'Alice', COMPANY: 'Acme' };
146
+ return values[name] || `[[${name}]]`;
58
147
  }
59
148
  });
149
+ const html = pipeline.renderMarkdown('Hello [[USER_NAME]] from [[COMPANY]]!');
150
+ // Output: Hello Alice from Acme!
60
151
  ```
61
152
 
153
+ Custom slot patterns are supported via `slotPattern`:
154
+
155
+ ```typescript
156
+ const pipeline = new MarkdownPipeline({
157
+ slotPattern: /\{\{(.*?)\}\}/g,
158
+ onSlot: (name) => values[name] || `{{${name}}}`
159
+ });
160
+ ```
161
+
162
+ ### Error Recovery (v2)
163
+
164
+ Three error recovery modes for production resilience:
165
+
166
+ ```typescript
167
+ // 'throw' (default) — backward compatible, throws on parse errors
168
+ const strict = new MarkdownPipeline({ errorRecovery: 'throw' });
169
+
170
+ // 'warn' — logs warning, returns partial content as fallback text
171
+ const tolerant = new MarkdownPipeline({ errorRecovery: 'warn' });
172
+
173
+ // 'silent' — silently returns fallback content
174
+ const silent = new MarkdownPipeline({ errorRecovery: 'silent' });
175
+ ```
176
+
177
+ Additional safety with `maxRecursionDepth` (default: 100) to prevent stack overflow on deeply nested content.
178
+
62
179
  ### API
63
180
 
64
181
  | Method | Description |
@@ -66,7 +183,225 @@ const pipeline = new MarkdownPipeline({
66
183
  | `parse(markdown)` | Parse markdown string to AST |
67
184
  | `render(nodes)` | Render AST to HTML string |
68
185
  | `renderMarkdown(markdown)` | Parse and render in one call |
69
- | `renderPage(title, nodes)` | Render AST to full HTML page |
186
+ | `renderPage(title, nodes, options?)` | Render AST to full HTML page |
187
+ | `getCustomCSS()` | Get custom CSS string from style config |
188
+ | `getConfig()` | Get current pipeline configuration |
189
+
190
+ ## Architecture (v2)
191
+
192
+ The pipeline is built from modular stages, each with a clear design pattern and single responsibility:
193
+
194
+ ```
195
+ Markdown String
196
+
197
+
198
+ ┌──────────────────────────┐
199
+ │ 1. Preprocessor Chain │ Chain of Responsibility
200
+ │ (preprocessor.ts) │ Transforms raw markdown before lexing
201
+ │ • ContainerBlock │ (e.g., ::: containers → HTML comments)
202
+ └──────────┬───────────────┘
203
+
204
+
205
+ ┌──────────────────────────┐
206
+ │ 2. marked.lexer() │ Third-party lexer
207
+ └──────────┬───────────────┘
208
+
209
+
210
+ ┌──────────────────────────┐
211
+ │ 3. Token Postprocessor │ Chain of Responsibility
212
+ │ (token-postprocessor │ Restructures flat tokens → nested tree
213
+ │ .ts) │ (e.g., comments → containerBlock)
214
+ │ • ContainerBlock │
215
+ └──────────┬───────────────┘
216
+
217
+
218
+ ┌──────────────────────────┐
219
+ │ 4. Token Handlers │ Strategy Pattern
220
+ │ (handlers/) │ Each marked token type has a dedicated
221
+ │ • TokenHandlerRegistry│ handler, registered by type name.
222
+ │ • CatchAllHandler │ Extensible at runtime via registry.
223
+ └──────────┬───────────────┘
224
+
225
+
226
+ ContentNode[]
227
+ (AST)
228
+
229
+
230
+ ┌──────────────────────────┐
231
+ │ 5. Renderer │ Strategy Pattern
232
+ │ (renderer-strategies │ Each ContentNode type has its own
233
+ │ .ts / lit-strategies │ render strategy — choose between:
234
+ │ .ts) │ • HTMLRenderer (plain HTML strings)
235
+ │ • NodeRendererStrategy│ • LitRenderer (Lit TemplateResult)
236
+ │ • LitNodeRendererStrat│
237
+ └──────────────────────────┘
238
+ ```
239
+
240
+ ### 1. Preprocessing (`preprocessor.ts`)
241
+
242
+ The `CompositePreprocessor` chains `Preprocessor` transforms that run on raw markdown **before** lexing. Built-in:
243
+
244
+ - **`ContainerBlockPreprocessor`** — converts `:::tag#id.class` fences to `<!-- md-container:... -->` HTML comment markers, so `marked` preserves them without affecting inner markdown parsing
245
+
246
+ The chain is extensible:
247
+
248
+ ```typescript
249
+ import { MarkdownParser, Preprocessor } from '@leadertechie/md2html';
250
+
251
+ class EmojiPreprocessor implements Preprocessor {
252
+ readonly name = 'emoji';
253
+ process(markdown: string): string {
254
+ return markdown.replace(':smile:', '😊');
255
+ }
256
+ }
257
+
258
+ const parser = new MarkdownParser();
259
+ parser.preprocessors.add(new EmojiPreprocessor());
260
+ ```
261
+
262
+ ### 2. Token Postprocessing (`token-postprocessor.ts`)
263
+
264
+ The `CompositeTokenPostprocessor` chains `TokenPostprocessor` transforms that run on the flat token array **after** lexing. Built-in:
265
+
266
+ - **`ContainerBlockPostprocessor`** — collapses `<!-- md-container:... -->` / `<!-- /md-container -->` markers into nested `containerBlock` tokens with proper parent-child structure (handles arbitrary nesting depth)
267
+
268
+ Custom postprocessors:
269
+
270
+ ```typescript
271
+ parser.postprocessors.add({
272
+ name: 'filter-unwanted',
273
+ process: (tokens) => tokens.filter(t => (t as any).type !== 'html')
274
+ });
275
+ ```
276
+
277
+ ### 3. Token Handling — Strategy Pattern (`handlers/`)
278
+
279
+ Each marked token type has its own `TokenHandler` class, registered in the `TokenHandlerRegistry`:
280
+
281
+ ```
282
+ src/handlers/
283
+ ├── types.ts # TokenHandler interface + ParseContext
284
+ ├── registry.ts # TokenHandlerRegistry with catch-all fallback
285
+ ├── heading-handler.ts # h1-h6
286
+ ├── paragraph-handler.ts # <p> with inline image/HTML support
287
+ ├── list-handler.ts # <ul>/<ol>
288
+ ├── image-handler.ts # <img>
289
+ ├── code-handler.ts # <pre><code>
290
+ ├── hr-handler.ts # <hr>
291
+ ├── blockquote-handler.ts # <blockquote>
292
+ ├── html-handler.ts # raw HTML passthrough
293
+ ├── link-handler.ts # <a>
294
+ ├── frontmatter-handler.ts# YAML frontmatter metadata
295
+ ├── container-block- # ::: container blocks
296
+ │ handler.ts
297
+ └── catchall-handler.ts # fallback for unregistered types
298
+ ```
299
+
300
+ **Extending the parser** — register custom handlers without modifying internals:
301
+
302
+ ```typescript
303
+ import { MarkdownParser, TokenHandler } from '@leadertechie/md2html';
304
+
305
+ const parser = new MarkdownParser();
306
+
307
+ // Override heading rendering
308
+ const customHeading: TokenHandler = {
309
+ type: 'heading',
310
+ handle: (token, ctx) => ({
311
+ type: 'container',
312
+ attributes: { tag: 'div', 'data-custom': 'true' },
313
+ children: [{
314
+ type: 'heading',
315
+ content: ctx.processSlots(token.text as string),
316
+ attributes: { level: String(token.depth) }
317
+ }]
318
+ })
319
+ };
320
+ parser.handlers.register(customHeading);
321
+
322
+ // Remove a handler to skip token types
323
+ parser.handlers.unregister('heading');
324
+
325
+ // Replace the catch-all for unregistered token types
326
+ parser.handlers.setCatchAll({
327
+ type: '*',
328
+ handle: (token) => ({
329
+ type: 'text',
330
+ content: `[fallback: ${token.type}]`
331
+ })
332
+ });
333
+ ```
334
+
335
+ **Catch-all handler** — When a token type has no dedicated handler (e.g., `table`, `def`), the `CatchAllHandler` wraps it in a `<div data-unhandled="type">` container so content is never silently lost. The `onUnhandledToken` callback notifies callers:
336
+
337
+ ```typescript
338
+ const parser = new MarkdownParser({
339
+ onUnhandledToken: (type, token) => {
340
+ console.warn(`[md2html] Unhandled token type: ${type}`);
341
+ }
342
+ });
343
+ ```
344
+
345
+ ### 4. Rendering — Strategy Pattern (`renderer-strategies.ts`, `lit-strategies.ts`)
346
+
347
+ The AST renderers use the same Strategy + Registry pattern as the token handlers:
348
+
349
+ - **`HTMLRenderer`** — produces plain HTML strings. Uses `NodeRendererStrategy` / `RendererStrategyRegistry` for each node type. Supports `classPrefix`, `addHeadingIds`, and `emitScopeAnchors` styling.
350
+ - **`LitRenderer`** — produces Lit `TemplateResult` objects. Uses `LitNodeRendererStrategy` / `LitStrategyRegistry`. Perfect for Lit web components.
351
+
352
+ Both registries are publicly accessible for customization:
353
+
354
+ ```typescript
355
+ import { HTMLRenderer, NodeRendererStrategy } from '@leadertechie/md2html';
356
+
357
+ const renderer = new HTMLRenderer({ classPrefix: 'my-' });
358
+
359
+ // Register a custom strategy
360
+ renderer.strategies.register({
361
+ type: 'custom',
362
+ render: (node, renderChild, ctx) => `<my-el>${node.content}</my-el>`
363
+ });
364
+ ```
365
+
366
+ The `LitRenderer.renderToHTMLString()` delegates to `HTMLRenderer` to avoid duplicating string rendering logic.
367
+
368
+ ### 5. Context Factory (`context-factory.ts`)
369
+
370
+ The `createParseContext()` pure function separates context construction from the parser class. It bridges parser services (image processing, slot resolution, HTML sanitization) to token handlers via the `ParserServices` interface. This makes the context testable in isolation and decouples handler logic from parser internals.
371
+
372
+ ### Source Map
373
+
374
+ ```
375
+ src/
376
+ ├── parser.ts # Orchestrator: coordinates pre/post-processing + token handling
377
+ ├── preprocessor.ts # Chain of Responsibility: markdown transforms before lexing
378
+ ├── token-postprocessor.ts # Chain of Responsibility: token transforms after lexing
379
+ ├── context-factory.ts # Factory: creates ParseContext for token handlers
380
+ ├── handlers/ # Strategy: per-token-type ContentNode producers
381
+ │ ├── types.ts
382
+ │ ├── registry.ts
383
+ │ ├── heading-handler.ts
384
+ │ ├── paragraph-handler.ts
385
+ │ ├── list-handler.ts
386
+ │ ├── image-handler.ts
387
+ │ ├── code-handler.ts
388
+ │ ├── hr-handler.ts
389
+ │ ├── blockquote-handler.ts
390
+ │ ├── html-handler.ts
391
+ │ ├── link-handler.ts
392
+ │ ├── frontmatter-handler.ts
393
+ │ ├── container-block-handler.ts
394
+ │ └── catchall-handler.ts
395
+ ├── renderer.ts # HTMLRenderer: transforms ContentNodes to plain HTML
396
+ ├── renderer-strategies.ts # Strategy: per-node-type HTML string renderers
397
+ ├── lit-renderer.ts # LitRenderer: transforms ContentNodes to Lit TemplateResult
398
+ ├── lit-strategies.ts # Strategy: per-node-type Lit TemplateResult renderers
399
+ ├── visitor.ts # Visitor: tree traversal utilities
400
+ ├── factory.ts # NodeFactory: ContentNode builder API
401
+ ├── pipeline.ts # Facade: high-level MarkdownPipeline API
402
+ ├── types.ts # Core types: ContentNode, MarkdownContent, configs
403
+ └── telemetry-init.ts # Shared logger initialization
404
+ ```
70
405
 
71
406
  ## License
72
407